Intermediate
Broadcasting¶
When we add a scalar to a 1-d array like this, the scalar gets added to each element of the array.
np.array([1,2,3]) + 0.5
# [1.5, 2.5, 3.5]
In essence, numpy is expanding the scalar into 3-element array and then does element-wise addition between the arrays. (NumPy doesn't actually do this because it'd be horribly inneficient, but in essence that's what's happening.) This is an example of broadcasting.
Compatibility¶
Not every pair of arrays are compatible for broadcasting. Suppose we want to add two arrays, A
and B
..
- Moving backwards from the last dimension of each array,
- We check if their dimensions are "compatible". Dimensions are compatible if they are equal or either of them is 1
- If all of
A
's dimensions are compatible withB
's dimensions, or vice versa, they are compatible arrays
Examples¶
Example 1¶
np.random.seed(1234)
A = np.random.randint(low = 1, high = 10, size = (3, 4))
B = np.random.randint(low = 1, high = 10, size = (3, 1))
print(A)
# [[4 7 6 5]
# [9 2 8 7]
# [9 1 6 1]]
print(B)
# [[7]
# [3]
# [1]]
A + B
= ???
Compatibility
A.shape # (3, 4)
B.shape # (3, 1)
## ^ ^
## compatible
Here, A
is a 3x4 array and B
is a 3x1 array. We start by comparing the last dimension of each array.
- Since the last dimension of
A
is length 4 and the last dimension ofB
is length 1, numpy can expandB
by making 4 copies of it along its second axis. So, these dimensions are compatible. - Now we have to compare the first dimension of
A
andB
. Since they're both length 3, they’re compatible.
The only thing left for numpy is to carry out whatever procedure we wanted on two equivalently sized 3x4 arrays. (Remember, NumPy doesn't actually expand B like this because it'd be horribly inneficient.)
Example 2¶
np.random.seed(4321)
A = np.random.randint(low = 1, high = 10, size = (4, 4))
B = np.random.randint(low = 1, high = 10, size = (2, 1))
print(A)
# [[3 9 3 2]
# [8 6 3 5]
# [7 1 9 7]
# [6 4 2 2]]
print(B)
# [[7]
# [2]]
A + B
= ???
Compatibility
A.shape # (4, 4)
B.shape # (2, 1)
## ^ ^
## not compatible
Here, A
is a 4x4 array and B
is a 2x1 array.
- The last dimension of
A
is length 4 and the last dimension ofB
is length 1, so these dimensions are compatible. We can temporarily transformB
by making 4 copies of it along its 2nd axis. - Now we compare the 1st dimension of each array. In this case, there isn't an obvious way to expand
B
into a 4x4 array to matchA
or vice versa, so these arrays are not compatible.
Example 3¶
np.random.seed(1111)
A = np.random.randint(low = 1, high = 10, size = (3, 1, 4))
B = np.random.randint(low = 1, high = 10, size = (2, 1))
print(A)
# [[[8 6 2 3]]
# [[5 9 7 5]]
# [[9 7 3 7]]]
print(B)
# [[9]
# [4]]
A + B
= ???
Compatibility
A.shape # (3, 1, 4)
B.shape # ( 2, 1)
# ^ ^ ^
# compatible
Here, A
is a 3x1x4 array and B
is a 2x1 array.
- We start by comparing the last dimension of each array. In this case,
A
is length 4 andB
is length 1, so we can expandB
into a 2x4 array, making these dimensions compatible. - Next, we compare the 2nd to last dimension of each array. In this case,
A
is length 1 andB
is length 2. This time, we expandA
, copying it twice along its second axis to matchB
. - At this point, we're out of
B
dimensions, so we knowA
andB
are compatible. To complete our mental model of how math between these arrays would work, we can imagine copyingB
3 times along a newly added first dimension. - We're left with two transformed arrays, each with shape 3x2x4, which we can easily add.
newaxis¶
np.newaxis
allows us to promote the dimensionality of an array by giving it a new axis.
For example, suppose you have 1-d arrays A
and B
,
A = np.array([3, 11, 4, 5])
B = np.array([5, 0, 3])
and your goal is to build a difference matrix where element \((i,j)\) gives \(A_i - B_j\). In other words, your goal is to
subtract each element of B
from each element of A
.
If you do A - B
, you'll get an error because the arrays don't have compatible shapes, and even if they were the same
size, numpy would just do element-wise subtraction.
However, if A was a 4x1 array and B was 1x3 array, numpy would broadcast the arrays so that A - B
produces the
difference matrix we desire.
We can convert A
from a (4,)
array into a (4,1)
array via
A[:, np.newaxis]
# array([[ 3],
# [11],
# [ 4],
# [ 5]])
Similarly, we can convert B
from a (3,)
array into a (1,3)
array via
B[np.newaxis, :]
# array([[5, 0, 3]])
Then we can calculate the difference matrix as
A[:, np.newaxis] - B[np.newaxis, :]
# array([[-2, 3, 0],
# [ 6, 11, 8],
# [-1, 4, 1],
# [ 0, 5, 2]])
We can further simplify this to A[:, np.newaxis] - B
since broadcasting rules will make B
compatible with A[:,
np.newaxis]
.
Note
newaxis
is just an alias for None
, so A[:, np.newaxis] - B
is equivalent to A[:, None] - B
.
reshape()¶
You can use the reshape()
function to change the shape of an array.
For example,
foo = np.arange(start=1, stop=9)
# [1 2 3 4 5 6 7 8]
We can reshape foo
into a 2x4 array using either np.reshape()
np.reshape(a=foo, newshape=(2,4))
# array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
or the reshape()
method of the array object.
foo.reshape(2,4)
# array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
These methods implement the same logic, just with slightly different interfaces.
Info
With foo.reshape()
, we can pass in the new dimensions individually instead of as a tuple, but this comes at
the expense of not being able to specify the newshape
keyword.
Array Transpose¶
You can also transpose an array using np.transpose()
or the .T
attribute of an array object.
bar = np.array([[1,2,3,4], [5,6,7,8]])
print(bar)
# [[1 2 3 4]
# [5 6 7 8]]
print(bar.T)
# [[1 5]
# [2 6]
# [3 7]
# [4 8]]
Boolean Indexing¶
With boolean indexing, you can subset an array A
using another array B
of boolean values.
Examples¶
Example 1¶
Suppose we have a 3x3 array, foo
foo = np.array([
[3, 9, 7],
[2, 0, 3],
[3, 3, 1]
])
and we set mask = foo == 3
mask = foo == 3
print(mask)
# [[ True False False]
# [False False True]
# [ True True False]]
We can use mask
to identify elements of foo
which are equal to 3.
print(foo[mask])
# [3 3 3 3]
Furthermore, we can use mask
to convert 3s in foo
to 0s.
foo[mask] = 0
print(foo)
# [[0 9 7]
# [2 0 0]
# [0 0 1]]
Example 2¶
Just like integer arrays, we can use 1-d boolean arrays to pick out specific rows or columns of a 2-d array.
Consider this 3x3 array, foo
foo = np.array([
[3, 9, 7],
[2, 0, 3],
[3, 3, 1]
])
and these 1-d, length 3 boolean arrays r13
and c23
.
r13 = np.array([True, False, True])
c23 = np.array([False, True, True])
We can use r13
to select rows 1 and 3 from foo
.
print(foo[r13])
# [[3 9 7]
# [3 3 1]]
We can use c23
to select columns 2 and 3 from foo
.
print(foo[c23])
[[2 0 3]
[3 3 1]]
Observe what happens when we index foo
with both r13
and c23
.
print(foo[r13, c23]) # (1)!
# [9 1]
- This is equivalent to
foo[[0,2], [1,2]]
NumPy treats boolean indices like integer indices, where the integers used are the indices of True elements. In other
words, NumPy treats the boolean index array [True, False, True]
just like the integer index array [0, 2]
and it
treats the boolean index array [False, True, True]
just like the integer index array [1,2]
.
So, foo[r13, c23]
is equivalent to foo[[0, 2], [1, 2]]
. Recall that when you combine row and column index arrays in
this way, NumPy uses corresponding indices from each index array to select elements from the target array - in this
case,
elements (0,1)
and (2,2)
.
Logical Operators¶
Logical operators let us combine boolean arrays. They include the "bitwise-and" operator, the "bitwise-or" operator, and the "bitwise-xor" operator.
b1 = np.array([False, False, True, True])
b2 = np.array([False, True, False, True])
b1 & b2 # [False, False, False, True], and
b1 | b2 # [False, True, True, True], or
b1 ^ b2 # [False, True, True, False], xor
Boolean Negation¶
We can negate a boolean array by preceding it with a tilde ~
.
~np.array([False, True])
# array([ True, False])
NaN¶
You can use NaN
to represent missing or invalid values. NaN
is a floating point constant that numpy
reserves and treats specially.
For example, consider this array called bot
which contains two missing values.
bot = np.ones(shape = (3, 4))
bot[[0, 2], [1, 2]] = np.nan
print(bot)
# [[ 1. nan 1. 1.]
# [ 1. 1. 1. 1.]
# [ 1. 1. nan 1.]]
If you want to identify which elements of bot
are NaN
, you might be inclined to try bot == np.nan
but the result may surprise you.
print(bot == np.nan)
# [[False False False False]
# [False False False False]
# [False False False False]]
NumPy designed NaN
so that nan == nan
returns False, but nan != nan
returns True.
np.nan == np.nan # False
np.nan != np.nan # True
This is because equivalence between missing or invalid values is not well-defined.
In order to see which elements of an array are NaN
, you can use NumPy's isnan()
function.
np.isnan(bot)
# array([[False, True, False, False],
# [False, False, False, False],
# [False, False, True, False]])
Caution
NaN
is a special floating point constant, so it can only exist inside an array of floats. If you try inserting
NaN
into an array of integers, booleans, or strings, you’ll get an error or unexpected behavior.
Infinity¶
Like NaN
, numpy reserves floating point constants for infinity and negative infinity
that behave specially.
If you want to insert these values directly, you can use np.inf
and np.NINF
np.array([np.inf, np.NINF])
# array([ inf, -inf])
More commonly, these values occur when you divide by 0.
np.array([-1, 1])/0
# array([-inf, inf])
random¶
You can use NumPy's random module to shuffle arrays, sample values from arrays, and draw values from a host of probability distributions.
Generators¶
Since Numpy version 1.17.0, it is recommended to use a Generator to produce random values rather than use the random module directly.
In most cases, the default random number generator is sufficient.
Initialize default_rng without a seed
rng = np.random.default_rng()
Initialize default_rng with a seed
rng = np.random.default_rng(12345)
Examples¶
Sample integers in range with replacement¶
Draw three integers from the range 1 to 6, with replacement.
generator = np.random.default_rng(seed=123)
generator.integers(low=1, high=7, size=3)
# array([1, 5, 4])
Sample integers in range without replacement¶
Draw three integers from the range 0 to 9, without replacement.
generator = np.random.default_rng(seed=123)
generator.choice(a=10, size=3, replace=False)
# array([5, 6, 0])
Randomly permute the rows of a 2-d array¶
Randomly shuffle the rows of this 5x2 array, foo
foo = np.array([
[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10]
])
generator = np.random.default_rng(seed=123)
generator.permutation(foo, axis=0)
# array([[ 9, 10],
# [ 1, 2],
# [ 5, 6],
# [ 7, 8],
# [ 3, 4]])
See random.Generator.permutation
Random sample from uniform distribution¶
Randomly sample four values between 1 and 2, then output as a 2x2 array.
generator = np.random.default_rng(seed=123)
generator.uniform(low=1.0, high=2.0, size=(2, 2))
# array([[1.68235186, 1.05382102],
# [1.22035987, 1.18437181]])
Random sample from normal distribution¶
Randomly sample two values from a standard normal distribution, then output as a length-2 1-d array.
generator = np.random.default_rng(seed=123)
generator.normal(loc=0.0, scale=1.0, size=2)
# array([-0.98912135, -0.36778665])
Random sample from binomial distribution¶
Randomly sample six values from a binomial distribution with n=10 and p=0.25, then output as a 3x2 array.
generator = np.random.default_rng(seed=123)
generator.binomial(n=10, p=0.25, size=(3, 2))
# array([[3, 0],
# [1, 1],
# [1, 4]])