Memory Management

Memory Management#

CAUTION: This section is not for the faint of heart, it’s not required for the course, but provides information toward optimising code, FEEL FREE TO SKIP IT

NumPy arrays are generally contiguous, meaning they are stored in memory in a single block, e.g., the array:

import numpy as np

test_array = np.random.rand(3, 3)

test_array
array([[0.02341688, 0.71647735, 0.79925591],
       [0.62802362, 0.86511259, 0.28272151],
       [0.62962487, 0.18796037, 0.39864759]])

Is stored in 9 blocks of memory that are next to each other, the dtype of the array is np.float64, this means that each element uses 64 bits or 8 bytes of memory (1 byte = 8 bits):

print(f"Data type of array: {test_array.dtype}")
print(f"Each element of the array uses {test_array.itemsize} bytes")
Data type of array: float64
Each element of the array uses 8 bytes

So to perform any task to the NumPy array, behind the scenes the “strides” are being used:

test_array.strides
(24, 8)

The latter number shows the step size for each element, the former shows the step size for each row, i.e., Our 3x3 array, is stored in memory as a 1x9 array, each element uses 8 bytes, when we move 3 elements across the data represents a new row (3x8 = 24 bytes).

We can alter the strides to do all sorts of magic, but we can also break things pretty terribly. The main takeaway from this exercise is that it’s more efficient to process things row by row, rather than column by column.

This is known as C-type contiguous arrays (taken from the C programming language).