Proficient
where()¶
You can use NumPy's where()
function as a vectorized form of "if array element meets condition, then x else y".
For example, given a 2-d array foo
foo = np.array([
[1,2,3],
[4,5,6]
])
We can create a corresponding array bar
which displays "cat" where foo
is even and "dog" where foo
is odd.
np.where(foo % 2 == 0, 'cat', 'dog')
array([['dog', 'cat', 'dog'],
['cat', 'dog', 'cat']], dtype='<U3')
Math Functions¶
sum()¶
Consider this 2-d array, foo
.
foo = np.array(
[[5.0, 2.0, 9.0],
[1.0, 0.0, 2.0],
[1.0, 7.0, 8.0]]
)
There are numerous ways to take its sum with the sum()
function.
Sum all the values of foo
np.sum(foo) # 35.0
Sum across axis 0 (column sums)
np.sum(foo, axis=0)
# array([ 7., 9., 19.])
Sum across axis 1 (row sums)
np.sum(foo, axis=1)
# array([16., 3., 16.])
If foo
contains NaN
s, sum()
returns NaN
foo[0, 0] = np.nan
np.sum(foo) # nan
There are numerous ways to exclude NaN
s or treat them as 0s.
np.sum(foo, where = ~np.isnan(foo)) # 30.0
where
parameter, telling the sum()
function to only include elements where ~np.isnan(foo)
evaluates to True;
np.sum(np.nan_to_num(foo)) # 30.0
NaN
s to 0 (by default).
Other Math Functions¶
Unsurprisingly, there are numerous math functions in NumPy including
minimum()
,
maximum()
,
mean()
,
exp()
,
log()
,
floor()
, and
ceil()
among others.
Truth Value Testing¶
all()¶
You can use the all()
function to check if all the values in an array meet some condition.
foo = np.array([
[np.nan, 4.4],
[ 1.0, 3.2],
[np.nan, np.nan],
[ 0.1, np.nan]
])
Check if all the values are NaN
np.all(np.isnan(foo))
# False
Check if all the values in each row are NaN
np.all(np.isnan(foo), axis=1)
# array([False, False, True, False])
Check if all the values in each column are NaN
np.all(np.isnan(foo), axis=0)
# array([False, False])
any()¶
You can use the any()
function to check if any of the values in an array meet some condition.
foo = np.array([
[np.nan, 4.4],
[ 1.0, 3.2],
[np.nan, np.nan],
[ 0.1, np.nan]
])
Check if any value is NaN
np.any(np.isnan(foo))
# True
Check if any value in each row is NaN
np.any(np.isnan(foo), axis=1)
array([ True, False, True, True])
Check if any value in each column is NaN
np.any(np.isnan(foo), axis=0)
array([ True, True])
concatenate()¶
You can use the concatenate()
function to combine two or more arrays.
roux = np.zeros(shape = (3,2))
print(roux)
[[0. 0.]
[0. 0.]
[0. 0.]]
gumbo = np.ones(shape = (2,2))
print(gumbo)
[[1. 1.]
[1. 1.]]
Concatenate roux
with a couple copies of itself row-wise.
np.concatenate((roux, roux, roux), axis=0)
# array([[0., 0.],
# [0., 0.],
# [0., 0.],
# [0., 0.],
# [0., 0.],
# [0., 0.],
# [0., 0.],
# [0., 0.],
# [0., 0.]])
Concatenate roux
with a couple copies of itself column-wise.
np.concatenate((roux, roux, roux), axis=1)
# array([[0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.],
# [0., 0., 0., 0., 0., 0.]])
Concatenate roux
and gumbo
row-wise.
np.concatenate((roux, gumbo), axis=0)
# array([[0., 0.],
# [0., 0.],
# [0., 0.],
# [1., 1.],
# [1., 1.]])
When you concatenate arrays, they must have the same exact shape excluding the axis along which you’re concatenating.
For example, if we try to concatenate roux
and gumbo
column-wise, NumPy throws an error.
np.concatenate((roux, gumbo), axis = 1)
# ValueError: (1)
- ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 3 and the array at index 1 has size 2
Stacking¶
You can use vstack()
, hstack()
, and stack()
to combine arrays.
vstack()¶
vstack()
takes one argument - a sequence of arrays. You could describe its algorithm in pseudocode as
for each array in the sequence:
if the array is 1-d:
promote the array to 2-d by giving it a new front axis
if every array has the same shape:
concatenate the arrays along axis 0
else:
throw an error
Visually, you could imagine vstack()
as vertically stacking 1-d or 2-d arrays.
Examples
foo = np.array(['a', 'b'])
bar = np.array(['c', 'd'])
baz = np.array([['e', 'f']])
bingo = np.array([['g', 'h', 'i']])
np.vstack((foo, bar))
# [['a' 'b']
# ['c' 'd']]
np.vstack((foo, bar, baz))
# [['a' 'b']
# ['c' 'd']
# ['e' 'f']]
np.vstack((baz, bingo))
# ValueError: (1)
- ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 2 and the array at index 1 has size 3
hstack()¶
hstack()
takes one argument - a sequence of arrays. You could describe its algorithm in pseudocode as
if every array in the sequence is 1-d:
concatenate the arrays along axis 0
else:
if every array has the same shape excluding axis 1:
concatenate arrays along axis 1
else:
throw an error
Visually, you could imagine hstack()
as horizontally stacking 1-d or 2-d arrays.
Examples
foo = np.array(['a', 'b'])
bar = np.array(['c', 'd'])
baz = np.array([['e', 'f']])
bingo = np.array([['g', 'h', 'i']])
bongo = np.array(
[['j', 'k'],
['l', 'm']]
)
np.hstack((foo, bar))
# ['a' 'b' 'c' 'd']
np.hstack((baz, bingo))
# [['e' 'f' 'g' 'h' 'i']]
np.hstack((foo, bingo))
# ValueError: (1)
- ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 1 dimension(s) and the array at index 1 has 2 dimension(s)
np.hstack((bingo, bongo))
# ValueError: (1)
- ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1 and the array at index 1 has size 2
stack()¶
stack()
takes two arguments:
- a sequence of arrays to combine
axis
which tellsstack()
to create a new axis along which to combine the arrays.
You could describe its algorithm in pseudocode as
if every array is the same shape and axis is less than or equal to the dimensionality of the arrays:
for each array:
insert a new axis where specified
concatenate the arrays along the new axis
else:
throw an error.
Examples
foo = np.array(['a', 'b'])
bar = np.array(['c', 'd'])
# np.stack((foo, bar), axis=0)
# array([['a', 'b'],
# ['c', 'd']], dtype='<U1')
np.stack((foo, bar), axis=1)
# array([['a', 'c'],
# ['b', 'd']], dtype='<U1')
np.stack((foo, bar), axis=2)
# numpy.AxisError: (1)
- numpy.AxisError: axis 2 is out of bounds for array of dimension 2
np.stack((foo, bar), axis=-1)
# array([['a', 'c'],
# ['b', 'd']], dtype='<U1')
Sorting¶
You can use numpy’s sort()
function to sort the elements of an array.
sort()
takes three primary parameters:
a
: the array you want to sortaxis
: the axis along which to sort. (The default, -1, sorts along the last axis.)kind
: the kind of sort you want numpy to implement. By default, numpy implements quicksort.
For example, here we make a 1-d array, foo, and then sort it in ascending order.
foo = np.array([1, 7, 3, 9, 0, 9, 1])
np.sort(foo)
# array([0, 1, 1, 3, 7, 9, 9])
Note that the original array remains unchanged.
foo
# array([1, 7, 3, 9, 0, 9, 1])
If you want to sort the values of foo
in place, use the sort
method of the array object.
foo.sort()
foo
# array([0, 1, 1, 3, 7, 9, 9])
Sort with NaN¶
If you have an array with NaN values, sort()
pushes them to the end of the array.
bar = np.array([5, np.nan, 3, 11])
np.sort(bar)
# array([ 3., 5., 11., nan])
Sort In Descending Order (Reverse Sort)¶
Unfortunately NumPy doesn't have a direct way of sorting arrays in descending order. However, there are multiple ways to accomplish this.
-
Sort the array in ascending order and then reverse the result.
bar = np.array([5, np.nan, 3, 11]) np.sort(bar)[::-1] # array([nan, 11., 5., 3.])
-
Negate the array’s values, sort those in ascending order, and then negate that result.
bar = np.array([5, np.nan, 3, 11]) -np.sort(-bar) # array([11., 5., 3., nan])
The main difference between these techniques is that the first method pushes NaN
s to the front of the array
and the second method pushes NaN
s to the back. Also, the second method won’t work on strings since you can’t negate a
string.
Sorting A Multidimensional Array¶
What if you wanted to sort a multidimensional array like this?
boo = np.array([
[55, 10, 12],
[20, 0, 33],
[55, 92, 3]
])
In this case, you can use the axis
parameter of the sort()
function to specify which axis to sort along.
Sort each column of a 2-d array¶
np.sort(boo, axis=0) # sort along the row axis
# array([[20, 0, 3],
# [55, 10, 12],
# [55, 92, 33]])
Sort each row of a 2-d array¶
np.sort(boo, axis=1) # sort along the column axis
# array([[10, 12, 55],
# [ 0, 20, 33],
# [ 3, 55, 92]])
Sort the last axis of an array¶
np.sort(boo, axis=-1) # sort along the last axis (1)
# array([[10, 12, 55],
# [ 0, 20, 33],
# [ 3, 55, 92]])
- Since
boo
is a 2-d array, the last axis, 1, is the column axis. Thusnp.sort(boo, axis=-1)
is equivalent tonp.sort(boo, axis=1)
.
Tip
When we talk about sorting along an axis, each element's position in the array remains fixed except for that
axis. For example, observe the 20 in boo
. When we sort along the row axis (axis 0), only its row coordinate
changes (from (1,0)
to (0,0)
). When we sort along the column axis (axis 1), only its column coordinate
changes (from (1,0)
to (1,1)
). That's why sorting along axis 0 does column sorts in a 2-d array and
sorting along axis 1 does row sorts in a 2-d array.
argsort()¶
argsort()
works just like sort(), except it returns an array of indices indicating the position each value of the array would map to in the sorted case.
Example
foo = np.array([3, 0, 10, 5])
np.argsort(foo)
# array([1, 0, 3, 2])
argsort()
tells us:
- the smallest element of
foo
is at position 1 - the second smallest element of
foo
is at position 0 - the third smallest element of
foo
is at position 3 - the fourth smallest element of
foo
is at position 2
If you used this array to index the original array, you’d get its sorted form (just as if you had called np.sort
(foo)
).
foo = np.array([3, 0, 10, 5])
idx = np.argsort(foo)
foo[idx]
# array([ 0, 3, 5, 10])
Sort the rows of a 2-d array according to its first column¶
boo = np.array([
[55, 10, 12],
[20, 0, 33],
[55, 92, 3]
])
If you want to reorder the rows of boo
according to the values in its first column, you can plug in the index
array [1, 0, 2]
.
idx = np.array([1, 0, 2])
boo[idx]
# array([[20, 0, 33],
# [55, 10, 12],
# [55, 92, 3]])
To create the index array dynamically, simply call argsort()
on the first column of boo
.
idx = np.argsort(boo[:, 0])
print(idx)
# [1 0 2]
boo[idx]
# array([[20, 0, 33],
# [55, 10, 12],
# [55, 92, 3]])
Stable Sorting¶
The previous example raises an important question. If an array has repeated values, how do we guarantee that sorting
them won't alter the order they appear in the original array? For example, given boo
boo = np.array([
[55, 10, 12],
[20, 0, 33],
[55, 92, 3]
])
this
boo[[1, 0, 2]]
# array([[20, 0, 33],
# [55, 10, 12],
# [55, 92, 3]])
and this
boo[[1, 2, 0]]
# array([[20, 0, 33],
# [55, 92, 3],
# [55, 10, 12]])
are both valid sorts of boo
along its first column, but only the first array retains the original order of the
rows beginning with 55. This is known as a stable sorting algorithm. By default,
np.sort()
and np.argsort()
don't use a stable sorting algorithm. If you'd like to use a stable sort, set the kind
parameter equal to 'stable'
.
boo[np.argsort(boo[:, 0], kind='stable')]
# array([[55, 10, 12],
# [20, 0, 33],
# [55, 92, 3]])
unique()¶
You can use the unique()
function to get the unique elements of an array.
Example
gar = np.array(['b', 'b', 'a', 'a', 'c', 'c'])
np.unique(gar)
# array(['a', 'b', 'c'], dtype='<U1')
You may have noticed that 'b' appeared first in the input but 'a' appeared first in the output. That's because
unique()
returns the unique elements in sorted order.
Get unique elements in order of first occurrence¶
You can use return_index=True
to get index of first occurrence of each element in an array.
gar = np.array(['b', 'b', 'a', 'a', 'c', 'c'])
np.unique(gar, return_index=True)
# (array(['a', 'b', 'c'], dtype='<U1'), array([2, 0, 4]))
With return_index=True
, numpy returns a tuple containing
- the unique elements array
- a corresponding array with the index at which each element first occurred in the original array
In the above example 'a' first occurred at index 2 in the original array, b first occurred at index 0, and so on.
If you want to reorder the unique elements in the same order they occurred in the original array, use argsort()
the index array and use that to sort the unique elements array.
gar = np.array(['b', 'b', 'a', 'a', 'c', 'c'])
uniques, first_positions = np.unique(gar, return_index=True)
uniques[np.argsort(first_positions)]
# array(['b', 'a', 'c'], dtype='<U1')
unique() with counts¶
You can use return_counts=True
to additionally return the count of each element.
np.unique(gar, return_counts=True)
# (array(['a', 'b', 'c'], dtype='<U1'), array([2, 2, 2]))