Python: How to create a generator function
As you might already know a function is used to return a result at the end of the execution, its a super handy feature but it has two…
As you might already know a function is used to return a result at the end of the execution, its a super handy feature but it has two potential disadvantages
- You must wait for the results despite that partial results might be sufficient to work, for example you have a function that generate millions of numbers and you have to wait for the function to be completed while you could work each number without the need for the function to be completed.
- Waiting for the function to return millions of numbers also means that those numbers need to be stored in memory which can be resource prone in case of millions of numbers
The solution to both problems is the yield statement that can turn a simple function to a generator function, lets see an example with a non-generator function.
The following script will generate 10m integers and after the generation of the numbers will re-loop them to do some calculations#!/usr/bin/env python3
from random import seed
from random import randintdef generate_10m_nums():
"""
This function will generate 10m numbers
"""
temp = []
for i in range(0,10000000):
temp.append(randint(0, 10))
return tempif __name__ == '__main__':nums = generate_10m_nums()
for i in nums:
i=i%2
Testing the script we see that needs about 10 seconds, 100% CPU Usage and 88Mb of RAM$ /usr/bin/time -v ./simple_function.py
Command being timed: "./simple_function.py"
User time (seconds): 10.20
System time (seconds): 0.07
Percent of CPU this job got: 100%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.27
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 88016
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 11554
Voluntary context switches: 1
Involuntary context switches: 10
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Lets do some modifications to our script and use the yield statement#!/usr/bin/env python3
from random import seed
from random import randintdef generate_10m_nums():
"""
This function will generate 10m numbers
"""
for i in range(0,10000000):
yield randint(0, 10)if __name__ == '__main__':for i in generate_10m_nums():
i=i%2
The generate_10m_nums() function has some modifications, there is no list named temp that we appended each generated number, instead we “yield” the random number its time its generatedyield randint(0, 10)
This makes the number available to the for statement that calculates the modulo of each number as soon as is available without the need to wait for all the numbers to be generatedfor i in generate_10m_nums():
i=i%2
Executing the script gives the following performance results$ /usr/bin/time -v ./generator_function.py
Command being timed: "./generator_function.py"
User time (seconds): 8.72
System time (seconds): 0.00
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.72
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 9552
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1121
Voluntary context switches: 1
Involuntary context switches: 7
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
We can notice the following
- user time: 8.72 seconds, a slight improvement from the previous script that needed 10.20 seconds, this is because there is only one loop and not two
- Maximum resident set size (kbytes): 9552, a great improvement from the previous script, the memory usage is almost 9 times less, this is because there is no list that keeps the numbers for processing, everything is processed upon generation
I hope this article gave you a simple way to understand generators and how to use them :)