Mindful Memory

Apr 20, 2021

Python is a high-level programming language and as such doesn’t have the same precise control of memory as low-level programs such as C. Despite this mindful Pythonistas can still be aware of the memory consumption of their programs and strive toward writing memory efficient code.

There are different motivations for reducing memory usage from processing data on a low powered laptop to reducing web server production costs. Almost all of these reasons start with looking at how your current program uses memory.

sys.getsizeof()

Let’s start with a simple function, getsizeof in the sys module. This function returns the size of an object in bytes. Here we can see the string “Python” consumes 55 bytes of memory.

from sys import getsizeof
>>> getsizeof("Python")
55

Getsizeof has limited functionality because it only returns the direct memory consumption of an object. This means Python container objects, including dicts, lists, tuples, and sets, as well as classes won’t return their full memory consumption footprint with getsizeof.

tracemalloc

The tracemalloc module is a debug tool used to trace memory blocks allocation. It can easily be incorporated into scripts to find the largest memory hogs.

Here is a simple test script, floats.py

from random import random 

floats = [random() for _ in range(10000000)]

def main():
    print(floats[:3])

main()

This script simply generates a list of 10 million random floats and prints out the first three.

$ python floats.py 
[0.9937146951190611, 0.9803072888538571, 0.6177404904978063]

Now let’s add tracemalloc to our floats.py script and see what lines consume the most memory.

import tracemalloc
tracemalloc.start()
from random import random 

floats = [random() for _ in range(10000000)]
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")

def main():
    # print(floats[:3])
    print("[ Top Ten ]")
    for stat in top_stats[:10]:
        print(stat)

main()

First we import tracemalloc from the Standard Library. Then we start the trace with tracemalloc.start(), take a snapshot with tracemalloc.take_snapshot()., and calculate the statistics with snapshot.statistics(“lineno”) Finally we print the top ten largest memory consumers.

Alright now let’s run our altered script.

$ python floats.py 
[ Top Ten ]
/home/pete/floats.py:5: size=314 MiB, count=10000000, average=33 B
<frozen importlib._bootstrap_external>:587: size=40.2 KiB, count=444, average=93 B
<frozen importlib._bootstrap>:228: size=16.3 KiB, count=184, average=91 B
/usr/lib64/python3.9/random.py:101: size=3092 B, count=13, average=238 B
/usr/lib64/python3.9/random.py:815: size=2552 B, count=1, average=2552 B
/usr/lib64/python3.9/random.py:820: size=2272 B, count=2, average=1136 B
/usr/lib64/python3.9/random.py:771: size=2119 B, count=11, average=193 B
<frozen importlib._bootstrap_external>:64: size=841 B, count=8, average=105 B
<frozen importlib._bootstrap>:353: size=800 B, count=11, average=73 B
<frozen importlib._bootstrap>:1007: size=648 B, count=3, average=216 B

Here we can see the code executed on line 5 consumed 314 MiB of memory. This line was where we defined the floats list consisting of 10 million floats. Now this was a simplistic example, but it shows how simple it is to debug the memory usage in Python scripts.

Now you should have a few tools on how to take a look at how Python scripts consume memory. This is a first step in writing better and more efficient code as we can now dissect how the specific algorithms and functions we use directly affect memory usage.