Map Filter Reduce is not the Point -- Part 0, Introduction
Hi! Quick disclaimer: this is my first ever blog post, so my writing may still be a bit rough. I hope you might still get something out of it though :)
Target audience: Anyone interested in functional programming and relatively new to it; I won't assume any knowledge about it.
The Idea of this Series
For this series of blog posts, I want to clear up some common misconceptions I've seen people make whilst they are first learning about functional programming, and the "map, filter, reduce pattern" in particular. These basically boil down to "I can achieve the same thing with a for-loop / list-comprehension".
A good example of this is ArjanCodes' video on map and filter, which primarily compares directly replacing loops and if-statements with map and filter.
I see two perspectives that are missing from these analyses, both of which I will be going into more depth in my follow-up blog posts:
map,filter,reduceis a way of thinking that is easily extended onto more complex problems. There is a whole family of functions that are just about acting on data in a structured way. Alternative approaches likefor-loops and list-comprehensions don't really work like this; they are a one-size-fits-all solution.map,filter, andreduceare functions, so they can be manipulated using functional techniques like partial function application and data processing pipelines; they are first-class citizens in functional languages.for-loops and list-comprehensions are syntax and second-class citizens in most languages.
For code examples, I'll be using Python. The reason for this is that, even though I personally don't really like Python, I'm pretty comfortable with it, it's got decent support for functional programming, and most people should be able to read it relatively well. I'll also add some basic type-hinting to all of my examples. It should be pretty safe to ignore these if you want, especially since later on they will get a bit more involved.
- Side Note
- Readability™ also, gets thrown around a lot, but I won't address that. I'll just say that I consider the kind of readability issues most people seem to have to really be about not being used to these tools and refer you to Rich Hickey's excellent talk "Simple Made Easy".
What is map, filter, reduce?
Just so we are all on the same page, I want to dedicate the rest of this introductory post to defining these operations I've been talking about. In short, map, filter, and reduce are functions that allow us to act on collections of values "all at once". Each is an example of a higher-order function, meaning that they take another function as an argument. This may seem a bit weird at first but is actually an extremely powerful idea. In object-oriented programming, you may call this a strategy pattern.
Feel free to skip this if you are already familiar with these operations.
map
map is probably the easiest to understand of the bunch. In the simplest case, it takes
- a function
fwith one argument and some kind of return value and - a collection
collof values
and returns a new collection where f was applied on each value in coll.
# the function we want to apply
def f(x: int) -> int:
return x * 2
coll = list(range(10))
print("original: ", coll)
# in python, `map` is lazy; we have to convert its result to a list to actually see it
print("no list conv: ", map(f, coll))
print("mapped: ", list(map(f, coll)))
original: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] no list conv: <map object at 0x1001f41f0> mapped: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Notice how, after applying map, the shape of the given collection does not change, only its contents do.
We can achieve the same effect using list-comprehensions:
print("list-comprehension:", [f(x) for x in coll])
list-comprehension: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
At the beginning, I said "in the simplest case" because map actually works with functions of any arity. For example, we could add two sequences of numbers pair-wise using the following code:
# in python, there is no function for adding two numbers,
# so we have to define our own.
# (well technically there is, in the `operator` module, but I'll still define it here)
def add(x: int, y: int) -> int:
return x + y
another_coll = list(range(10, 20))
print("mapped:", list(map(add, coll, another_coll)))
mapped: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
Again, we can achieve the same thing with a list comprehension, though this time, we have to use zip in order to create the pairs we want to act on:
print("list-comprehension:", [add(x, y) for x, y in zip(coll, another_coll)])
list-comprehension: [10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
filter
filter, as the name suggests, lets us filter out values from a collection. Thus, where map lets us transform values in a collection, filter lets us transform the shape of the collection itself without touching the values.
It, too, takes two arguments:
- A function
pthat takes a single argument and returns eitherTrueorFalse(this is often called a predicate function, or simply predicate). It is used to indicate whether or not to keep a specific value. - A collection
collof values.
# the predicate that indicates whether we should keep values
def p(x: int) -> bool:
return x < 10
coll = list(range(0, 20, 2))
print("original: ", coll)
# in python, `filter` is also lazy.
print("filter object: ", filter(p, coll))
print("filter: ", list(filter(p, coll)))
# again, we can achieve the same effect with a list-comprehension
print("list-comprehension:", [x for x in coll if p(x)])
original: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] filter object: <filter object at 0x105404190> filter: [0, 2, 4, 6, 8] list-comprehension: [0, 2, 4, 6, 8]
Because both filter and map are lazy, they can be used well in conjunction, allowing us to only act on the data we actually want to:
above_10 = filter(lambda x: x > 10, coll)
multiplied = map(lambda x: x*3, above_10)
even = filter(lambda x: x % 2 == 0, multiplied)
# use `print` to illustrate that the last `map` doesn't "see" all data
printed = map(print, even)
list(printed)
36 42 48 54
This also lets us work with extremely large (or even infinite) sequences without any issues (as long as we limit the number of results we want at any point):
from collections.abc import Iterable
def mk_infinite_numbers() -> Iterable[int]:
i = 0
while True:
yield i
i = i + 1
def is_even(x: int) -> bool:
return x % 2 == 0
def power_of_two(x: int) -> int:
return 2 ** x
def starts_with_1(x: int) -> bool:
return str(x)[0] == "1"
def take[T](n: int, seq: Iterable[T]) -> list[T]:
"""Return the first `n` values from a (possibly infinite) sequence."""
return [next(seq) for _ in range(n)]
infinite_numbers = mk_infinite_numbers()
evens = filter(is_even, infinite_numbers)
even_powers_of_two = map(power_of_two, evens)
that_start_with_1 = filter(starts_with_1, even_powers_of_two)
print("first 10:", take(10, that_start_with_1))
print("next 10:", take(10, that_start_with_1))
first 10: [1, 16, 1024, 16384, 1048576, 16777216, 1073741824, 17179869184, 1099511627776, 17592186044416] next 10: [1125899906842624, 18014398509481984, 1152921504606846976, 18446744073709551616, 1180591620717411303424, 18889465931478580854784, 1208925819614629174706176, 19342813113834066795298816, 1237940039285380274899124224, 19807040628566084398385987584]
reduce
reduce is by for the most difficult to understand (and also most powerful) operation of the three. While the first two capture how me may change the shape and contents of a collection, returning a new collection, reduce allows us to aggregate collections into completely new kinds of data.
There are some different definitions of what exactly reduce requires floating around. I'll use the strictest one, often also called fold, which is a function that takes three arguments:
- A function
fwith two arguments: the current aggregated valueaggand the new valuexto aggregate. - A collection
collof values to aggregate. - The starting value of
agg.
I haven't really explained what "aggregating" means in this context. Roughly speaking, this refers to the idea of incrementally updating a value with new information. This may be something as simple as computing a sum, or as complex as managing the current state of a game, given the collection of all player inputs. Strictly speaking, we can even define map and filter in terms of reduce, but I'll leave that as home-work for the reader ;)
coll = list(range(1, 11))
print("original:", coll)
# in python, `reduce` is not in the global namespace.
# we have to import it from the builtin `functools` library.
from functools import reduce
print("sum, using reduce:", reduce(add, coll, 0))
original: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] sum, using reduce: 55
Now this is where trivial alternatives start getting more difficult to come by. Of course, in Python, we could use the sum function for this. But if we didn't have that, we would start using for-loops and variable overriding (something we would want to avoid when writing programs in a functional style).
print("sum, using sum:", sum(coll))
def my_sum(coll: list[int]) -> int:
current_sum = 0
for x in coll:
current_sum = current_sum + x
return current_sum
print("sum, using for-loop:", my_sum(coll))
sum, using sum: 55 sum, using for-loop: 55
As for a less trivial example, let's calculate the cumulative sum, using reduce and a for-loop (we could also use itertools.accumulate, but that's effectively just a lazy reduce in this case):
# we'll just use the previous collection this time
def cumsum_increment(agg: list[int], x: int) -> list[int]:
new_value = agg[-1] + x
return agg + [new_value]
print("cumsum, using reduce:", reduce(cumsum_increment, coll, [0]))
def cumsum_using_for(coll: list[int]) -> list[int]:
cumsums = [0]
for x in coll:
new_value = cumsums[-1] + x
cumsums = cumsums + [new_value]
return cumsums
print("cumsum, using for-loop:", cumsum_using_for(coll))
cumsum, using reduce: [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55] cumsum, using for-loop: [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
You may noticed some similarities between the two approaches. This is how we could implement reduce using a for-loop:
def my_reduce(f, coll, initial):
agg = initial
for x in coll:
agg = f(agg, x)
return agg
Bonus: Collecting Results, the General Way
As a bonus, I'd like to highlight a way we can implement the cumulative sum above by re-using the add function we defined previously. This solution is also more general, allowing us to cumulatively aggregate any kind of operation we may want to do with reduce (e.g. to keep track of the entire history of our game, so we can time-travel).
For this, we will be defining a new higher-order function that takes an "aggregator" function like above and turns it into one that returns a list of all of its results. For higher-order functions, type-hints can get a bit much, unfortunately, so I separated the two.
def cumulative_aggregation(f):
def wrapped(agg, x):
return agg + [f(agg[-1], x)]
return wrapped
Then, we can use it to calculate our cumulative sum:
print("cumsum, using function composition:", reduce(cumulative_aggregation(add), coll, [0]))
cumsum, using function composition: [0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
Now, we can trivially do the same thing using any other aggregator function, like subtraction:
def minus(x: int, y: int) -> int:
return x - y
print("total difference:", reduce(minus, coll, 0))
print("cumulative difference:", reduce(cumulative_aggregation(minus), coll, [0]))
total difference: -55 cumulative difference: [0, -1, -3, -6, -10, -15, -21, -28, -36, -45, -55]
As promised, here's the type-hinted version, using Python's new syntax for generic types:
from collections.abc import Callable
def cumulative_aggregation_typed[Agg, X](
f: Callable[[Agg, X], Agg]
) -> Callable[[list[Agg], X], list[Agg]]:
def wrapped(agg: list[Agg], x: X) -> list[Agg]:
return agg + [f(agg[-1], x)]
return wrapped