Python itertools: an overview
The itertool module contains fast and memory efficient functions for working with sequence data sets. They provide better memory consumption characteristics than list because data is only produced when needed and therefore all data does not need to be stored in memory at the same time. It is comparable to lazy-loading images on webpages to optimize the critical rendering path. It also leads to improved performance due to reduced swapping.
Below are descriptions and code examples for the functions from the itertool module. Itertools functions generally return a generator. To demonstrate the resulting elements, the output for the following examples has been wrapped in the list() function. To use itertools, import them with import itertools
.
accumulate()
Accumulates the sums of each element in an iterable and uses addition or concatenation depending on input type by default.
list1 = ('ABCD') list2 = (1, 2, 3, 4) print(list(itertools.accumulate(list1))) # output: ['A', 'AB', 'ABC', 'ABCD'] print(list(itertools.accumulate(list2))) # output: [1, 3, 6, 10]
chain()
Takes multiple iterators as arguments and chains them together to produce a single iterator. This makes it easy to process several sequences without having to first construct one large list.
iterable1 = (1, 2, 3) iterable2 = ('DEF') print(list(itertools.chain(iterable1, iterable2))) # output: [1, 2, 3, 'D', 'E', 'F']
chain.from_iterable()
Takes a nested list and flattens it.
iterable3 = ([[1, 2, 3], ['D', 'E', 'F']]) print(list(itertools.chain.from_iterable(iterable3))) # output: [1, 2, 3, 'D', 'E', 'F']
combinations()
Returns all possible combinations of a given length with no repeated elements for a given iterable. The length can also be given using the r-argument.
list1 = ('a', 'b', 'c') list2 = (1, 2, 3) print(list(itertools.combinations(list1, len(list1)))) # output: [('a', 'b', 'c')] print(list(itertools.combinations(list1, len(list1) - 1))) # output: [('a', 'b'), ('a', 'c'), ('b', 'c')] print(list(itertools.combinations(list2, len(list2)))) # output: [(1, 2, 3)] print(list(itertools.combinations(list2, r=2))) # output: [(1, 2), (1, 3), (2, 3)]
combinations_with_replacement()
Returns all possible combinations of a given length including repeated elements for a given iterable. The length can also be given using the r-argument.
list1 = ('a', 'b', 'c') list2 = (1, 2, 3) print(list( itertools.combinations_with_replacement(list1, len( list1)))) # output: [('a', 'a', 'a'), ('a', 'a', 'b'), ('a', 'a', 'c'), # ('a', 'b', 'b'), ('a', 'b', 'c'), ('a', 'c', 'c'), # ('b', 'b', 'b'), ('b', 'b', 'c'), ('b', 'c', 'c'), # ('c', 'c', 'c')] print(list( itertools.combinations_with_replacement(list1, r=2))) # output: # [('a', 'a'), ('a', 'b'), ('a', 'c'), # ('b', 'b'), ('b', 'c'), # ('c', 'c')] print(list( itertools.combinations_with_replacement( list2, len(list2)))) # output: # [(1, 1, 1), (1, 1,-2), (1, 1, 3), # (1, 2, 2), (1, 2, 3), (1, 3, 3), # (2, 2, 2), (2, 2, 3), (2, 3, 3), (3, 3, 3)] print(list( itertools.combinations_with_replacement(list2, r=2))) # output: # [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
compress()
Filters an iterable using Boolean values from another iterable. A true value causes the value from the data iterable to be produced in the resulting iterable while a false value causes it to be ignored. Stops when either the data or selectors iterables has been exhausted.
input_list = ['A', 'B', 'C', 'D', 'E'] selector = [True, False, False, True, True] print(list(itertools.compress(input_list, selector))) # output: ['A', 'D', 'E'] list2 = ['A', 'B', 'C', 'D', 'E', 'F'] selector2 = [1, 1, 0] print(list(itertools.compress(list2, selector2))) # output: ['A', 'B']
count()
Known as an infinite iterator, itertools.count() returns evenly spaced values until infinity. Each value will be returned in a new line. The default start and step numbers are 0 and 1 respectively and can be changed by specifying the first argument start and/or the second argument step. To count down, a negative value for step can be given. To exit the loop, an if-statement in combination with 'break' can be used.
for i in itertools.count(): print(i) if i > 5: break # output: 0 1 2 3 4 5 6 (each number on a new line) for i in itertools.count(3, 4): print(i) if i > 8: break # output: 3 7 11 (each number on a new line) for i in itertools.count(10, -1): print(i) if i < 7: break # output: 10 9 8 7 6 (each number on a new line)
cycle()
Another infinite iterator, itertools.cycle() returns an iterator that repeats the contents of the arguments it is given indefinitely. Can consume quite a bit of memory if the iterator is long as it has to store the entire contents of the input iterator. A counter variable can be used to break out of the loop.
list1 = [1, 10, 100] list2 = ['a', 'b', 'c'] sum_value = 0 for i in itertools.cycle(list1): print(i) sum_value += i if sum_value > 300: break # output: # 1 10 100 1 10 100 1 10 100 (each number on a new line) #sum_value = 0 for i in zip(range(5), itertools.cycle(list2)): print(i) # output: # (0, 'a') # (1, 'b') # (2, 'c') # (3, 'a') # (4, 'b')
dropwhile()
Returns an iterator that contains elements of the input iterator after a condition becomes false for the first time. After the condition is false the first time, all of the remaining items in the input are returned.
int_list = [0, 1, 2, 3, 4, 5, 6, -5] result = list( itertools.dropwhile(lambda x: x < 3, int_list)) print(result) # output: [3, 4, 5, 6, -5] def doesnt_contain_character(string): substring = 'a' if substring in string: return False else: return True string_list = ['To', 'boldly', 'go', 'Tea', 'Earl', 'Grey', 'hot'] print(list( itertools.dropwhile( doesnt_contain_character, string_list))) # output: ['Tea', 'Earl', 'Grey', 'hot']
filterfalse()
Returns an iterator that includes only items from the input iterator where the test function returns false.
int_list = [1, 7, 0, 1, 6, 5, 2] result = list( itertools.filterfalse(lambda x: x < 5, int_list)) print(result) # output: [7, 6 ,5]
groupby()
Creates an iterator that returns consecutive keys and groups from the input iterable. Generates a break or a new group every time the value of the key changes by default. To group by unique elements, the input iterable needs to be sorted first.
str_list = 'AAABBACCC' result1 = (itertools.groupby(str_list)) for key, iter_item in result1: print(f"Key: {key}") for item in iter_item: print(item, end=" ") print() # output: # Key: A # A A A # Key: B # B B # Key: A # A # Key: C # C C C anagrams = ['alert', 'alter', 'later', 'beard', 'bared', 'bread', 'debar', 'loop', 'polo', 'pool'] # use list comprehension # to save the list of grouped anagrams grouped_anagrams = [list( group) for key, group in itertools.groupby( sorted(anagrams, key=sorted), sorted)] # the sorted function takes the anagrams list # and sorts the anagrams and their counterparts # sorted is passed as keyfunc in the groupby method, # which returns a key to be grouped on print(grouped_anagrams) # output: # [['beard', 'bared', 'bread', 'debar'], # ['alert', 'alter', 'later'], # ['loop', 'polo', 'pool']]
islice()
Creates an iterator that returns selected elements within an
input iterable, given the start, stop and step arguments. If start is None
, then iteration starts at zero. If step is None
, then the step defaults to one. Does not accept negative values for any of the arguments. Is more memory-efficient than regular index slicing as it iterates over the existing iterable instead of creating a new one.
str_list = 'AABBCCDDEEFF' result = list(itertools.islice(str_list, 2, 9, 3)) print(result) # output: ['B', 'C', 'E'] for i in itertools.islice(range(100), 10, 100, 10): print(i, end=' ')
pairwise()
Returns successive overlapping pairs taken from the input iterable. Useful to iterate over an iterable with a rolling window of two elements. Requires Python 3.10!
str_list = ('ABCDE') result = list(itertools.pairwise(str_list)) print(result) # output: [('A', 'B'), ('B', 'C'), ('C', 'D'), ('D', 'E')]
permutations()
Returns all possible permutations of an iterable without repeated elements
list1 = ('cat', 'mouse', 'dog') list2 = (1, 2, 3) print(list(itertools.permutations(list1))) # output: # [('cat', 'mouse', 'dog'), # ('cat', 'dog', 'mouse'), # ('mouse', 'cat', 'dog'), # ('mouse', 'dog', 'cat'), # ('dog', 'cat', 'mouse'), # ('dog', 'mouse', 'cat')] print(list(itertools.permutations(list2,)) # output: # [(1, 2, 3), # (1, 3, 2), # (2, 1, 3), # (2, 3, 1), # (3, 1, 2), # (3, 2, 1)]
The r-argument can be used to limit the length of the returned tuples:
print(list(itertools.permutations(list2, r=2))) # output: #[(1, 2), # (1, 3), # (2, 1), # (2, 3), # (3, 1), # (3, 2)]
product()
Can replace nested for loops that iterate over multiple sequences by producing a single iterable whose values are the Cartesian product of given list of iterables.
list1 = ('a', 'b', 'c') list2 = (1, 2) list3 = (10, 15, 20) list4 = (5, 8) print(list(itertools.product(list1, list2))) # output: # [('a', 1), ('a', 2), # ('b', 1), ('b', 2), # ('c', 1), ('c', 2)] print(list(itertools.product(list3, list4))) # output: # [(10, 5), (10, 8), # (15, 5), (15, 8), # (20, 5), (20, 8)]
repeat()
Yet another infinite iterator, itertools.repeat() repeats an iterable indefinitely unless the optional times argument is specified.
# output: # 20 # 20 # 20 for list1 in itertools.repeat([0, 1, 2], 3): print(list1) # output: # [0, 1, 2] # [0, 1, 2] # [0, 1, 2]
It is also possible to repeat a function:
for func in itertools.repeat(len, 3): print(func('cat')) # output: # 3 # 3 # 3
starmap()
Maps a function to each inner item in a single iterator.
tuples_list = [(5, 2), (4, 6), (1, 14)] result = list( itertools.starmap(lambda x, y: x + y, tuples_list)) print(result) # output: [7, 10, 15]
takewhile()
Opposite of dropwhile(). Returns an iterator that contains elements of the input iterator as long as a condition is true. After the condition is false the first time, none of the remaining items in the input are returned.
int_list = [0, 1, 2, 3, 4, 5, 6, -5] result = list( itertools.takewhile(lambda x: x < 3, int_list)) print(result) # output: [0, 1, 2] def contains_character(str): substring = 'o' if substring in str: return True else: return False string_list = ['To', 'boldly', 'go', 'Tea', 'Earl', 'Grey', 'hot'] print(list( itertools.takewhile(contains_character, string_list))) # output: ['To', 'boldly', 'go']
tee()
Rturns several independent iterators based on a single original input. This is useful to provide the same set of data into multiple algorithms that can then be processed in parallel. As the new iterators share their input, the original iterator should not be used after the new ones have been created.
int_list = [2, 5, 10, 22, 47] list1, list2, list3 = itertools.tee(int_list, 3) print(list(list1)) print(list(list2)) print(list(list3)) # output: # [2, 5, 10, 22, 47] # [2, 5, 10, 22, 47] # [2, 5, 10, 22, 47]
zip_longest()
Returns an iterator that combines the elements of multiple iterators into tuples. It can be used even if the iterators produce different numbers of values. Unmatched elements are filled with None
. To use another substitute value, the fillvalue
argument can be specified.
first_list = ('ABCD') second_list = ('XY') result1 = list( itertools.zip_longest(first_list, second_list)) result2 = list( itertools.zip_longest( first_list, second_list, fillvalue='?')) print(result1) # output: # [('A', 'X'), ('B', 'Y'), ('C', None), ('D', None)] print(result2) # output: # [('A', 'X'), ('B', 'Y'), ('C', '?'), ('D', '?')]