Intermediate

Broadcasting¶

When we add a scalar to a 1-d array like this, the scalar gets added to each element of the array.

np.array([1,2,3]) + 0.5
# [1.5, 2.5, 3.5]

In essence, numpy is expanding the scalar into 3-element array and then does element-wise addition between the arrays. (NumPy doesn't actually do this because it'd be horribly inneficient, but in essence that's what's happening.) This is an example of broadcasting.

Compatibility¶

Not every pair of arrays are compatible for broadcasting. Suppose we want to add two arrays, A and B..

Moving backwards from the last dimension of each array,
1. We check if their dimensions are "compatible". Dimensions are compatible if they are equal or either of them is 1
2. If all of A's dimensions are compatible with B's dimensions, or vice versa, they are compatible arrays

Examples¶

Example 1¶

np.random.seed(1234)
A = np.random.randint(low = 1, high = 10, size = (3, 4))
B = np.random.randint(low = 1, high = 10, size = (3, 1))

print(A)
# [[4 7 6 5]
#  [9 2 8 7]
#  [9 1 6 1]]

print(B)
# [[7]
#  [3]
#  [1]]

A + B = ???

Compatibility

A.shape  # (3, 4)
B.shape  # (3, 1)
##          ^  ^
##         compatible

Here, A is a 3x4 array and B is a 3x1 array. We start by comparing the last dimension of each array.

Since the last dimension of A is length 4 and the last dimension of B is length 1, numpy can expand B by making 4 copies of it along its second axis. So, these dimensions are compatible.
Now we have to compare the first dimension of A and B. Since they're both length 3, they’re compatible.

The only thing left for numpy is to carry out whatever procedure we wanted on two equivalently sized 3x4 arrays. (Remember, NumPy doesn't actually expand B like this because it'd be horribly inneficient.)

Example 2¶

np.random.seed(4321)
A = np.random.randint(low = 1, high = 10, size = (4, 4))
B = np.random.randint(low = 1, high = 10, size = (2, 1))

print(A)
# [[3 9 3 2]
#  [8 6 3 5]
#  [7 1 9 7]
#  [6 4 2 2]]

print(B)
# [[7]
#  [2]]

A + B = ???

Compatibility

A.shape  # (4, 4)
B.shape  # (2, 1)
##          ^  ^
##         not compatible

Here, A is a 4x4 array and B is a 2x1 array.

The last dimension of A is length 4 and the last dimension of B is length 1, so these dimensions are compatible. We can temporarily transform B by making 4 copies of it along its 2nd axis.
Now we compare the 1st dimension of each array. In this case, there isn't an obvious way to expand B into a 4x4 array to match A or vice versa, so these arrays are not compatible.

Example 3¶

np.random.seed(1111)
A = np.random.randint(low = 1, high = 10, size = (3, 1, 4))
B = np.random.randint(low = 1, high = 10, size = (2, 1))

print(A)
# [[[8 6 2 3]]
#  [[5 9 7 5]]
#  [[9 7 3 7]]]

print(B)
# [[9]
#  [4]]

A + B = ???

Compatibility

A.shape  # (3, 1, 4)
B.shape  # (   2, 1)
#           ^  ^  ^
#         compatible

Here, A is a 3x1x4 array and B is a 2x1 array.

We start by comparing the last dimension of each array. In this case, A is length 4 and B is length 1, so we can expand B into a 2x4 array, making these dimensions compatible.
Next, we compare the 2nd to last dimension of each array. In this case, A is length 1 and B is length 2. This time, we expand A, copying it twice along its second axis to match B.
At this point, we're out of B dimensions, so we know A and B are compatible. To complete our mental model of how math between these arrays would work, we can imagine copying B 3 times along a newly added first dimension.
We're left with two transformed arrays, each with shape 3x2x4, which we can easily add.

newaxis¶

np.newaxis allows us to promote the dimensionality of an array by giving it a new axis.

For example, suppose you have 1-d arrays A and B,

A = np.array([3, 11, 4, 5])
B = np.array([5, 0, 3])

and your goal is to build a difference matrix where element \((i,j)\) gives \(A_i - B_j\). In other words, your goal is to subtract each element of B from each element of A.

If you do A - B, you'll get an error because the arrays don't have compatible shapes, and even if they were the same size, numpy would just do element-wise subtraction.

However, if A was a 4x1 array and B was 1x3 array, numpy would broadcast the arrays so that A - B produces the difference matrix we desire.

We can convert A from a (4,) array into a (4,1) array via

A[:, np.newaxis]
# array([[ 3],
#        [11],
#        [ 4],
#        [ 5]])

Similarly, we can convert B from a (3,) array into a (1,3) array via

B[np.newaxis, :]
# array([[5, 0, 3]])

Then we can calculate the difference matrix as

A[:, np.newaxis] - B[np.newaxis, :]
# array([[-2,  3,  0],
#        [ 6, 11,  8],
#        [-1,  4,  1],
#        [ 0,  5,  2]])

We can further simplify this to A[:, np.newaxis] - B since broadcasting rules will make B compatible with A[:, np.newaxis].

Note

newaxis is just an alias for None, so A[:, np.newaxis] - B is equivalent to A[:, None] - B.

reshape()¶

You can use the reshape() function to change the shape of an array.

For example,

foo = np.arange(start=1, stop=9)
# [1 2 3 4 5 6 7 8]

We can reshape foo into a 2x4 array using either np.reshape()

np.reshape(a=foo, newshape=(2,4))
# array([[1, 2, 3, 4],
#        [5, 6, 7, 8]])

or the reshape() method of the array object.

foo.reshape(2,4)
# array([[1, 2, 3, 4],
#        [5, 6, 7, 8]])

These methods implement the same logic, just with slightly different interfaces.

Info

With foo.reshape(), we can pass in the new dimensions individually instead of as a tuple, but this comes at the expense of not being able to specify the newshape keyword.

Array Transpose¶

You can also transpose an array using np.transpose() or the .T attribute of an array object.

bar = np.array([[1,2,3,4], [5,6,7,8]])

print(bar)
# [[1 2 3 4]
#  [5 6 7 8]]

print(bar.T)
# [[1 5]
#  [2 6]
#  [3 7]
#  [4 8]]

Boolean Indexing¶

With boolean indexing, you can subset an array A using another array B of boolean values.

Examples¶

Example 1¶

Suppose we have a 3x3 array, foo

foo = np.array([
    [3, 9, 7],
    [2, 0, 3],
    [3, 3, 1]
])

and we set mask = foo == 3

mask = foo == 3

print(mask)
# [[ True False False]
#  [False False  True]
#  [ True  True False]]

We can use mask to identify elements of foo which are equal to 3.

print(foo[mask])
# [3 3 3 3]

Furthermore, we can use mask to convert 3s in foo to 0s.

foo[mask] = 0

print(foo)
# [[0 9 7]
#  [2 0 0]
#  [0 0 1]]

Example 2¶

Just like integer arrays, we can use 1-d boolean arrays to pick out specific rows or columns of a 2-d array.

Consider this 3x3 array, foo

foo = np.array([
    [3, 9, 7],
    [2, 0, 3],
    [3, 3, 1]
])

and these 1-d, length 3 boolean arrays r13 and c23.

r13 = np.array([True, False, True])
c23 = np.array([False, True, True])

We can use r13 to select rows 1 and 3 from foo.

print(foo[r13])
# [[3 9 7]
#  [3 3 1]]

We can use c23 to select columns 2 and 3 from foo.

print(foo[c23])
[[2 0 3]
 [3 3 1]]

Observe what happens when we index foo with both r13 and c23.

print(foo[r13, c23])  # (1)!
# [9 1]

This is equivalent to foo[[0,2], [1,2]]

NumPy treats boolean indices like integer indices, where the integers used are the indices of True elements. In other words, NumPy treats the boolean index array [True, False, True] just like the integer index array [0, 2] and it treats the boolean index array [False, True, True] just like the integer index array [1,2].

So, foo[r13, c23] is equivalent to foo[[0, 2], [1, 2]]. Recall that when you combine row and column index arrays in this way, NumPy uses corresponding indices from each index array to select elements from the target array - in this case, elements (0,1) and (2,2).

Logical Operators¶

Logical operators let us combine boolean arrays. They include the "bitwise-and" operator, the "bitwise-or" operator, and the "bitwise-xor" operator.

b1 = np.array([False, False, True, True])
b2 = np.array([False, True, False, True])

b1 & b2  # [False, False, False,  True], and
b1 | b2  # [False,  True,  True,  True], or
b1 ^ b2  # [False,  True,  True, False], xor

Boolean Negation¶

We can negate a boolean array by preceding it with a tilde ~.

~np.array([False, True])
# array([ True, False])

NaN¶

You can use NaN to represent missing or invalid values. NaN is a floating point constant that numpy reserves and treats specially.

For example, consider this array called bot which contains two missing values.

bot = np.ones(shape = (3, 4))
bot[[0, 2], [1, 2]] = np.nan

print(bot)
# [[ 1. nan  1.  1.]
#  [ 1.  1.  1.  1.]
#  [ 1.  1. nan  1.]]

If you want to identify which elements of bot are NaN, you might be inclined to try bot == np.nan but the result may surprise you.

print(bot == np.nan)
# [[False False False False]
#  [False False False False]
#  [False False False False]]

NumPy designed NaN so that nan == nan returns False, but nan != nan returns True.

np.nan == np.nan  # False
np.nan != np.nan  # True

This is because equivalence between missing or invalid values is not well-defined.

In order to see which elements of an array are NaN, you can use NumPy's isnan() function.

np.isnan(bot)
# array([[False,  True, False, False],
#        [False, False, False, False],
#        [False, False,  True, False]])

Caution

NaN is a special floating point constant, so it can only exist inside an array of floats. If you try inserting NaN into an array of integers, booleans, or strings, you’ll get an error or unexpected behavior.

Infinity¶

Like NaN, numpy reserves floating point constants for infinity and negative infinity that behave specially.

If you want to insert these values directly, you can use np.inf and np.NINF

np.array([np.inf, np.NINF])
# array([ inf, -inf])

More commonly, these values occur when you divide by 0.

np.array([-1, 1])/0
# array([-inf,  inf])

random¶

You can use NumPy's random module to shuffle arrays, sample values from arrays, and draw values from a host of probability distributions.

Generators¶

Since Numpy version 1.17.0, it is recommended to use a Generator to produce random values rather than use the random module directly.

In most cases, the default random number generator is sufficient.

Initialize default_rng without a seed

rng = np.random.default_rng()

Initialize default_rng with a seed

rng = np.random.default_rng(12345)

Examples¶

Sample integers in range with replacement¶

Draw three integers from the range 1 to 6, with replacement.

generator = np.random.default_rng(seed=123)
generator.integers(low=1, high=7, size=3)
# array([1, 5, 4])

See random.Generator.integers

Sample integers in range without replacement¶

Draw three integers from the range 0 to 9, without replacement.

generator = np.random.default_rng(seed=123)
generator.choice(a=10, size=3, replace=False)
# array([5, 6, 0])

See random.Generator.choice

Randomly permute the rows of a 2-d array¶

Randomly shuffle the rows of this 5x2 array, foo

foo = np.array([
    [1, 2],
    [3, 4],
    [5, 6],
    [7, 8],
    [9, 10]
])
generator = np.random.default_rng(seed=123)
generator.permutation(foo, axis=0)
# array([[ 9, 10],
#        [ 1,  2],
#        [ 5,  6],
#        [ 7,  8],
#        [ 3,  4]])

See random.Generator.permutation

Random sample from uniform distribution¶

Randomly sample four values between 1 and 2, then output as a 2x2 array.

generator = np.random.default_rng(seed=123)
generator.uniform(low=1.0, high=2.0, size=(2, 2))
# array([[1.68235186, 1.05382102],
#        [1.22035987, 1.18437181]])

See random.Generator.uniform

Random sample from normal distribution¶

Randomly sample two values from a standard normal distribution, then output as a length-2 1-d array.

generator = np.random.default_rng(seed=123)
generator.normal(loc=0.0, scale=1.0, size=2)
# array([-0.98912135, -0.36778665])

See random.Generator.normal

Random sample from binomial distribution¶

Randomly sample six values from a binomial distribution with n=10 and p=0.25, then output as a 3x2 array.

generator = np.random.default_rng(seed=123)
generator.binomial(n=10, p=0.25, size=(3, 2))
# array([[3, 0],
#        [1, 1],
#        [1, 4]])

See random.Generator.binomial