Loops, iterators and generators
Loops are one of the most common programming constructs. In Python though, there are a series of subtle loop behaviors that are essential to grasp in order to understand some of Python's more advanced features. First let's start by exploring the typical for loop syntax applied to a Python string, list and dictionary, which is illustrated in listing A-17.
Listing A-17. Python implicit iterator behavior
astring = 'coffee' coffee_types = ['Cappuccino','Latte','Macchiato'] address = {'city':'San Diego','state':'CA','country':'US'} # Standard for loop, with implicit iterator for letter in astring: print(letter) # Loop with enumerator (counter) for counter,letter in enumerate(astring): print(counter,letter) # Standard for loop, with implicit iterator for coffee in coffee_types: print(coffee) # Loop with enumerator (counter), starting at 1 for counter,coffee in enumerate(coffee_types,start=1): print(counter,coffee) # Standard for loop, with implicit iterator for key,value in address.items(): print(key,value) # Standard for loop, with implicit iterator for address_key in address.keys(): print(address_key) # Standard for loop, with implicit iterator for address_value in address.values(): print(address_value)
As you can see in listing A-17,
the standard for loop syntax in Python is for <item> in
<container>
. In each case, the for loop steps through
uninterrupted through every <item>
in the
<container>
, where the <item>
varies depending on the <container>
(e.g. for a
string <container>
the
<items>
are letters, for a list
<container>
the <items>
are
list elements).
In addition to the standard for
syntax, listing A-17 also make use of three special methods. The
first one is the built-in Python function enumerate()
which functions as a counter. In the second and fourth examples you
can see the <container>
is wrapped in the
enumerate()
method, which gives the loop access access
to a counter variable that's declared alongside the
<item>
variable. By default, the
enumerate()
counter starts at 0, but can be set to
start at any number by using the start
argument. In
addition, listing A-17 also makes use of the dictionary
keys()
and items()
methods, which extract
a list of dictionary keys and values, respectively.
Although the examples in listing
A-17 are straightforward, there is more going on behind the scenes.
When you create for loops like the ones in listing A-17 the
<container>
on which you're creating the loop
actually uses a construct called an iterator. To advance
through the <container>
Python uses the
iterator's __next__
method to walk through each of the
items in the <container>
. Because there's no
syntax related to the iterator or __next__
method,
these for loops are said to use implicit iterators.
Iterators are such an important
concept in Python, they're first class citizens with built-in
support in the language, just like strings, integers, lists and
dictionaries. In addition, to support iterators, other data types
must implement what's called the iterator protocol, which is why
you can automatically invoke a for loop on strings, lists,
dictionaries and other data structures without actually seeing an
iterator or __next__
method.
To further illustrate the concept of iterators, let's explore a set of similar examples to the ones in listing A-17 but this time with explicit iterators.
Listing A-18. Python explicit iterator behavior
astring = 'coffee' coffee_types = ['Cappuccino','Latte','Macchiato'] # Create explicit iterators astring_iter = iter(astring) coffee_types_iter = iter(coffee_types) # Print iterator types print(type(astring_iter)) print(type(coffee_types_iter)) # Call built-in next() to advance over the iterator # Or iterator's __next__() works the same (e.g. astring_iter.__next__() ) >>> next(astring_iter) 'c' >>> next(astring_iter) 'o' >>> next(astring_iter) 'f' >>> next(astring_iter) 'f' >>> next(astring_iter) 'e' >>> next(astring_iter) 'e' >>> next(astring_iter) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration # Call built-in next() to advance over the iterator # Or iterator's __next__() works the same (e.g. coffee_types_iter.__next__() ) >>> next(coffee_types_iter) 'Cappuccino' >>> next(coffee_types_iter) 'Latte' >>> next(coffee_types_iter) 'Macchiato' >>> next(coffee_types_iter) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration # Create an orders number iterator order_numbers = iter([1,2,3,4,5,6,7,8,9,10]) # Get order number next(order_numbers) # Do other stuff next(order_numbers) # Do other stuff next(order_numbers)
The first two lines in listing
A-18 declare a string and list. Next, you can see both values are
wrapped with the built-in Python function iter()
that's used to create an explicit iterator. To advance or
walk-through an iterator, you use the Python built-in function
next()
on the iterator reference -- or you can also
use the iterator's __next__()
method. As you can
see in listing A-18, each time next()
is called on the
iterator reference, it advances to the next item just like a for
loop, until it reaches the end and throws
StopIteration
when no more items are present.
Now that you've seen both
implicit and explicit iterators, you might be left wondering what's
the point of explicit iterators and calling next()
to
advance each time ? The point is explicit iterators give you the
opportunity to pause and get the next element whenever you want,
where as for loops with implicit iterators just run through the
elements uninterrumpted.
This opportunity to pause and get
the next element when you need is a powerful feature which is best
illustrated with the last example in listing A-18. You can see at
the end of listing A-18, the order_numbers
iterator is
a list of numbers. Next, a call is made to the next()
method on the order_numbers
iterator to fetch the next
item only when its needed, giving us the ability to continue doing
other work in between fetching items.
As powerful a concept as pausing
an iteration is, the last iterator example in listing A-18 has some
flaws. First, it's a long list of values which won't scale for
hundreds or thousands of items, which is problematic both in terms
of hard-coding the values, as well as keeping them in memory. In
addition, using the iter()
method and declaring
everything as a global iterator variable can lead to debugging and
scoping issues. A cleaner and more efficient solution to achieve
the same result is using a generator.
A generator is an iterator
embodied as a function. In a standard function -- like those
described in the past section -- you call the function and it runs
uninterrupted until it finds return
or reaches the end
and returns None
by default. In a generator function
-- which are simply called generators -- you can integrate pauses
into a function so that each time it's called, it's a continuation
of the previous call. In addition, generators have the ability to
generate values on-demand -- which is how they get their name -- so
they're much more efficient when it comes to handling large data
ranges.
Listing A-19 illustrates a series
of generators and their critical piece which is Python's
yield
keyword.
Listing A-19. Python generator behavior
---------------------------------------------------------------------------- def funky_order_numbers(): yield 100 yield 350 yield 575 yield 700 yield 950 order_numbers = funky_order_numbers() print(type(order_numbers)) # Call built-in next() to advance over the iterator # Or iterator's __next__() works the same (e.g. order_numbers.__next__() ) >>> next(order_numbers) 100 ... ... >>> next(order_numbers) 950 >>> next(order_numbers) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration ---------------------------------------------------------------------------- def generate_order_numbers(n): for i in range(1,n): yield i regular_order_numbers = generate_order_numbers(100) # Call built-in next() to advance over the iterator # Or iterator's __next__() works the same (e.g. regular_order_numbers.__next__() ) >>> next(regular_order_numbers) 1 ... ... >>> next(regular_order_numbers) 99 >>> next(regular_order_numbers) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration ---------------------------------------------------------------------------- def generate_infinite_order_numbers(n): while True: n += 1 yield n infinite_order_numbers = generate_infinite_order_numbers(1000) # Call built-in next() to advance over the iterator # Or iterator's __next__() works the same (e.g. infinite_order_numbers.__next__() ) >>> next(infinite_order_numbers) 1001 ... ... # next() never reaches end due to infinite 'while True:' statement
The first generator in listing
A-19 funky_order_numbers()
is unconventional in trying
to illustrate the use of yield
, which is a combination
of return & stop behavior. Once you assign a generator to a
reference, you can start stepping through the generator with
Python's built-in function next()
-- or you can also
use the generator's __next__()
method.
The first time a
next()
call is made, the generator gets to yield
100
, where it returns 100
and stops until
next()
is called again. On the second
next()
call the generator gets to yield
350
, it returns 350
and stops until
next()
is called again. This process goes on until the
last yield
statement is reached and Python return
StopIteration
-- just like it does with iterators.
Using multiple yield
statements and hard-coding values in a generator is rarely done, so
the second generator in listing A-19 gives more justice to the name
generator. The generator generate_order_numbers(n)
accepts an input number and then creates a loop from 1 to
n
, where each iteration returns yield i
and i
is the iteration number.
As you can see in listing A-19,
the first step is to assign the generator to a reference by
initializing it with a number. Once the assignment is made, you
walk through the generator calling the next()
method
and each time it returns the current counter until you reach the
initialization number. An interesting behavior of generators is
they generate their return values on-demand, so in this case even
if you initialize the generator with a large number, say
10000000
, the generator doesn't create a
10000000
element list that takes up memory, it
generates and returns values as they're needed.
The third generator in listing
A-19 generate_infinite_order_numbers()
is designed to
never end. The generator
generate_infinite_order_numbers(n)
accepts an input
number and then creates an infinite loop to return yield
n+i
on each iteration. In the same way as the other
generators, you call the next()
method on the
reference and each time it returns a subsequent value. Because of
the while True:
statement that represents an infinite
loop, this generator never ends and thus never returns
StopIteration
.