Loops, iterators and generators

Loops are one of the most common programming constructs. In Python though, there are a series of subtle loop behaviors that are essential to grasp in order to understand some of Python's more advanced features. First let's start by exploring the typical for loop syntax applied to a Python string, list and dictionary, which is illustrated in listing A-17.

Listing A-17. Python implicit iterator behavior

astring = 'coffee'
coffee_types = ['Cappuccino','Latte','Macchiato']
address = {'city':'San Diego','state':'CA','country':'US'}

# Standard for loop, with implicit iterator
for letter in astring:
    print(letter)

# Loop with enumerator (counter)
for counter,letter in enumerate(astring):
    print(counter,letter)

# Standard for loop, with implicit iterator
for coffee in coffee_types:
    print(coffee)

# Loop with enumerator (counter), starting at 1
for counter,coffee in enumerate(coffee_types,start=1):
    print(counter,coffee)

# Standard for loop, with implicit iterator
for key,value in address.items():
    print(key,value)

# Standard for loop, with implicit iterator
for address_key in address.keys():
    print(address_key)

# Standard for loop, with implicit iterator
for address_value in address.values():
    print(address_value)

As you can see in listing A-17, the standard for loop syntax in Python is for <item> in <container>. In each case, the for loop steps through uninterrupted through every <item> in the <container>, where the <item> varies depending on the <container> (e.g. for a string <container> the <items> are letters, for a list <container> the <items> are list elements).

In addition to the standard for syntax, listing A-17 also make use of three special methods. The first one is the built-in Python function enumerate() which functions as a counter. In the second and fourth examples you can see the <container> is wrapped in the enumerate() method, which gives the loop access access to a counter variable that's declared alongside the <item> variable. By default, the enumerate() counter starts at 0, but can be set to start at any number by using the start argument. In addition, listing A-17 also makes use of the dictionary keys() and items() methods, which extract a list of dictionary keys and values, respectively.

Although the examples in listing A-17 are straightforward, there is more going on behind the scenes. When you create for loops like the ones in listing A-17 the <container> on which you're creating the loop actually uses a construct called an iterator. To advance through the <container> Python uses the iterator's __next__ method to walk through each of the items in the <container>. Because there's no syntax related to the iterator or __next__ method, these for loops are said to use implicit iterators.

Iterators are such an important concept in Python, they're first class citizens with built-in support in the language, just like strings, integers, lists and dictionaries. In addition, to support iterators, other data types must implement what's called the iterator protocol, which is why you can automatically invoke a for loop on strings, lists, dictionaries and other data structures without actually seeing an iterator or __next__ method.

To further illustrate the concept of iterators, let's explore a set of similar examples to the ones in listing A-17 but this time with explicit iterators.

Listing A-18. Python explicit iterator behavior

astring = 'coffee'
coffee_types = ['Cappuccino','Latte','Macchiato']

# Create explicit iterators
astring_iter = iter(astring)
coffee_types_iter = iter(coffee_types)

# Print iterator types
print(type(astring_iter))
print(type(coffee_types_iter))

# Call built-in next() to advance over the iterator 
# Or iterator's __next__() works the same (e.g. astring_iter.__next__() )
>>> next(astring_iter)
'c'
>>> next(astring_iter)
'o'
>>> next(astring_iter)
'f'
>>> next(astring_iter)
'f'
>>> next(astring_iter)
'e'
>>> next(astring_iter)
'e'
>>> next(astring_iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

# Call built-in next() to advance over the iterator 
# Or iterator's __next__() works the same (e.g. coffee_types_iter.__next__() )
>>> next(coffee_types_iter)
'Cappuccino'
>>> next(coffee_types_iter)
'Latte'
>>> next(coffee_types_iter)
'Macchiato'
>>> next(coffee_types_iter)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

# Create an orders number iterator
order_numbers = iter([1,2,3,4,5,6,7,8,9,10])

# Get order number
next(order_numbers)

# Do other stuff
next(order_numbers)

# Do other stuff
next(order_numbers)

The first two lines in listing A-18 declare a string and list. Next, you can see both values are wrapped with the built-in Python function iter() that's used to create an explicit iterator. To advance or walk-through an iterator, you use the Python built-in function next() on the iterator reference -- or you can also use the iterator's __next__() method. As you can see in listing A-18, each time next() is called on the iterator reference, it advances to the next item just like a for loop, until it reaches the end and throws StopIteration when no more items are present.

Now that you've seen both implicit and explicit iterators, you might be left wondering what's the point of explicit iterators and calling next() to advance each time ? The point is explicit iterators give you the opportunity to pause and get the next element whenever you want, where as for loops with implicit iterators just run through the elements uninterrumpted.

This opportunity to pause and get the next element when you need is a powerful feature which is best illustrated with the last example in listing A-18. You can see at the end of listing A-18, the order_numbers iterator is a list of numbers. Next, a call is made to the next() method on the order_numbers iterator to fetch the next item only when its needed, giving us the ability to continue doing other work in between fetching items.

As powerful a concept as pausing an iteration is, the last iterator example in listing A-18 has some flaws. First, it's a long list of values which won't scale for hundreds or thousands of items, which is problematic both in terms of hard-coding the values, as well as keeping them in memory. In addition, using the iter() method and declaring everything as a global iterator variable can lead to debugging and scoping issues. A cleaner and more efficient solution to achieve the same result is using a generator.

A generator is an iterator embodied as a function. In a standard function -- like those described in the past section -- you call the function and it runs uninterrupted until it finds return or reaches the end and returns None by default. In a generator function -- which are simply called generators -- you can integrate pauses into a function so that each time it's called, it's a continuation of the previous call. In addition, generators have the ability to generate values on-demand -- which is how they get their name -- so they're much more efficient when it comes to handling large data ranges.

Listing A-19 illustrates a series of generators and their critical piece which is Python's yield keyword.

Listing A-19. Python generator behavior

----------------------------------------------------------------------------
def funky_order_numbers():
    yield 100
    yield 350
    yield 575
    yield 700
    yield 950

order_numbers = funky_order_numbers()
print(type(order_numbers))

# Call built-in next() to advance over the iterator 
# Or iterator's __next__() works the same (e.g. order_numbers.__next__() )
>>> next(order_numbers)
100
...
...
>>> next(order_numbers)
950
>>> next(order_numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

----------------------------------------------------------------------------
def generate_order_numbers(n):
    for i in range(1,n):
        yield i

regular_order_numbers = generate_order_numbers(100)

# Call built-in next() to advance over the iterator 
# Or iterator's __next__() works the same (e.g. regular_order_numbers.__next__() )
>>> next(regular_order_numbers)
1
...
...
>>> next(regular_order_numbers)
99
>>> next(regular_order_numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

----------------------------------------------------------------------------
def generate_infinite_order_numbers(n):
    while True:
          n += 1      
          yield n

infinite_order_numbers = generate_infinite_order_numbers(1000)

# Call built-in next() to advance over the iterator 
# Or iterator's __next__() works the same (e.g. infinite_order_numbers.__next__() )
>>> next(infinite_order_numbers)
1001
...
...
# next() never reaches end due to infinite 'while True:' statement

The first generator in listing A-19 funky_order_numbers() is unconventional in trying to illustrate the use of yield, which is a combination of return & stop behavior. Once you assign a generator to a reference, you can start stepping through the generator with Python's built-in function next() -- or you can also use the generator's __next__() method.

The first time a next() call is made, the generator gets to yield 100, where it returns 100 and stops until next() is called again. On the second next() call the generator gets to yield 350, it returns 350 and stops until next() is called again. This process goes on until the last yield statement is reached and Python return StopIteration -- just like it does with iterators.

Using multiple yield statements and hard-coding values in a generator is rarely done, so the second generator in listing A-19 gives more justice to the name generator. The generator generate_order_numbers(n) accepts an input number and then creates a loop from 1 to n, where each iteration returns yield i and i is the iteration number.

As you can see in listing A-19, the first step is to assign the generator to a reference by initializing it with a number. Once the assignment is made, you walk through the generator calling the next() method and each time it returns the current counter until you reach the initialization number. An interesting behavior of generators is they generate their return values on-demand, so in this case even if you initialize the generator with a large number, say 10000000, the generator doesn't create a 10000000 element list that takes up memory, it generates and returns values as they're needed.

The third generator in listing A-19 generate_infinite_order_numbers() is designed to never end. The generator generate_infinite_order_numbers(n) accepts an input number and then creates an infinite loop to return yield n+i on each iteration. In the same way as the other generators, you call the next() method on the reference and each time it returns a subsequent value. Because of the while True: statement that represents an infinite loop, this generator never ends and thus never returns StopIteration.