Advanced
Advanced Array Indexing¶
Earlier we discussed basic array indexing techniques. Here we discuss advanced techniques and take a deeper dive into how array indexing works.
We start by setting up a 3x2x4 array of integers, foo
.
foo = np.arange(3*2*4).reshape((3,2,4))
print(foo)
# [[[ 0 1 2 3]
# [ 4 5 6 7]]
# [[ 8 9 10 11]
# [12 13 14 15]]
# [[16 17 18 19]
# [20 21 22 23]]]
Observe the result of foo[:,:,0]
print(foo[:,:,0])
# [[ 0 4]
# [ 8 12]
# [16 20]]
Recall the mental model for interpreting an N-dimensional array.
- 1-d array: imagine a row of values
- 2-d array: imagine a matrix of values
- 3-d array: imagine a row of matrices
- 4-d array: imagine a matrix of matrices
In this case, we can think of foo
as a row of matrices where element (i,j,k)
corresponds to the ith matrix,
the jth row, and the kth column. So, foo[:,:,0]
is requesting "every matrix, every row, column 0" resulting in
this sub-array.
# [[ 0 4]
# [ 8 12]
# [16 20]]
What's surprising is that we started with a 3-dimensional array, but foo[:,:,0]
returned a 2-dimensional array.
When we select more than one column, for example like this
print(foo[:,:,[0,1]])
# [[[ 0 1]
# [ 4 5]]
# [[ 8 9]
# [12 13]]
# [[16 17]
# [20 21]]]
the result is a 3-dimensional array as you might expect.
Subsetting with same-shape index arrays¶
Let's see another example, this time using 3x2 index arrays for i, j, and k indexes of foo
.
i = [
[0,2],
[2,0],
[1,1]
]
j = [
[0,0],
[0,0],
[1,1]
]
k = [
[0,1],
[0,2],
[0,3]
]
print(foo[i,j,k])
# [[ 0 17]
# [16 2]
# [12 15]]
Important
When every dimension is indexed with an array (or list) and each of those arrays is the same shape, the output array will be the same shape as the index arrays.
In this case each of our index arrays is 3x2, so we know our result will also be a 3x2 array.
To understand the values in the result, you could imagine zipping the three index arrays into a matrix of 3-element tuples like this
[[(0,0,0), (2,0,1)],
[(2,0,0), (0,0,2)],
[(1,1,0), (1,1,3)]]
Here, each tuple gives the location of the corresponding output element.
Here's another example where we use 2x2x2x2 index arrays to extract values from foo
. Notice the result is also
2x2x2x2.
idx = np.zeros(shape=(2,2,2,2), dtype='int64')
result = foo[idx, idx, idx]
result.shape # (2, 2, 2, 2)
Again, the shape of the output is dependent on the shape of the index arrays, not the shape of the array you're indexing into.
Subsetting with varying-shape index arrays¶
We've considered the case where every dimension has an index array and each of the index arrays are the same shape. What if every dimension has an index array, but the index arrays have different shapes?
i = [0,1]
j = [0,1]
k = [[0],
[2],
[3]]
print(foo[i,j,k])
# [[ 0 12]
# [ 2 14]
# [ 3 15]]
In this case, NumPy broadcasts the index arrays, in essence forming three 2x3 index arrays.
i = [[0, 1],
[0, 1],
[0, 1]]
j = [[0, 1],
[0, 1],
[0, 1]]
k = [[0, 0],
[2, 2],
[3, 3]]
Tip
You can do np.broadcast_arrays(i,j,k)
to see how the original index arrays broadcast into identically shaped
index arrays. See broadcast_arrays()
.
Subsetting foo
with equivalently shaped index arrays is straight-forward based on what we already covered.
Subsetting with sice indexers¶
Lastly, we consider the case where our array contains slice indexers :
. For example,
i = [0,0,2,2]
k = [[0],
[1],
[2]]
print(foo[i,:,k])
# [[[ 0 4]
# [ 0 4]
# [16 20]
# [16 20]]
# [[ 1 5]
# [ 1 5]
# [17 21]
# [17 21]]
# [[ 2 6]
# [ 2 6]
# [18 22]
# [18 22]]]
What's going on here? The trick to understanding these complex scenarios is to build fully expanded, same-shaped index arrays for each dimension as follows:
broadcast all index arrays
for each slicer:
Copy each index array N times along a new last axis where N is the size of the current slicer’s dimension
Represent the current slicer with an index array
Let's walk through that last example. We start by broadcasting the index arrays into the following (3,4) arrays.
i = [[0, 0, 2, 2],
[0, 0, 2, 2],
[0, 0, 2, 2]]
k = [[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2]]
Now we create an index array for j. Start with a (3,4) array of ?
s.
j = [[?, ?, ?, ?],
[?, ?, ?, ?],
[?, ?, ?, ?]]
Since j spans the second axis of foo
which is length-2, replace each ?
with the indices [0,1]
.
j = [[[0,1], [0,1], [0,1], [0,1]],
[[0,1], [0,1], [0,1], [0,1]],
[[0,1], [0,1], [0,1], [0,1]]]
j is then promoted from a (3,4) array to a (3,4,2) array.
i and k are then each copied twice along a new last axis to match j's shape. The final (3,4,2) index arrays are
i = [[[0, 0],
[0, 0],
[2, 2],
[2, 2]],
[[0, 0],
[0, 0],
[2, 2],
[2, 2]],
[[0, 0],
[0, 0],
[2, 2],
[2, 2]]]
j = [[[0, 1],
[0, 1],
[0, 1],
[0, 1]],
[[0, 1],
[0, 1],
[0, 1],
[0, 1]],
[[0, 1],
[0, 1],
[0, 1],
[0, 1]]]
k = [[[0, 0],
[0, 0],
[0, 0],
[0, 0]],
[[1, 1],
[1, 1],
[1, 1],
[1, 1]],
[[2, 2],
[2, 2],
[2, 2],
[2, 2]]]
Note
NumPy doesn't actually expand the index arrays in this way as it'd be innefficient, but NumPy's indexing algorithm is logically equivalent to this type of index array expansion.
Note
The slicer we used in this example :
is a full slice but we could use fancier slices like ::2
which gets every
second element or :3:-1
which gets the first three elements in reverse order.
Trailing Indexes¶
It's also worth mentioning that you can exclude trailing indexes in which case NumPy assumes you want all the values
from those excluded dimensions. For example, foo[[0,1]]
is equivalent to foo[[0,1], :, :]
.
Indexing with a scalar¶
Recall foo[:,:,0]
returns a two-dimensional array even though foo
is a three-dimensional array. Why?
Recall the algorithm for determining the output of a subset with slicers and index arrays.
broadcast all index arrays
for each slicer:
Copy each index array N times along a new last axis where N is the size of the current slicer’s dimension
Represent the current slicer with an index array
We start by broadcasting all the index arrays. In this case there's only one index array - 0 - so we don't have to do any broadcasting, but what's the shape of a scalar like 0? It's actually empty or 0-dimensional. You can see this by calling
np.array(0).shape # ()
So we have an empty dimensional index, (). Then we concatenate the size of each dimension with a slice, so we end up with a (3,2) result.
If we had wrapped the 0 index in square brackets, the final result would be a (3,2,1) array.
foo[:,:,[0]].shape # (3, 2, 1)
View vs Copy¶
It's important to understand when array indexing produces a view and when it produces a copy.
Suppose we have this 2d array, spongebob
spongebob = np.arange(12).reshape(3,-1)
print(spongebob)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
We can pick out the first two columns using array indexers or slicing.
squidward = spongebob[:, [0,1]]
patrick = spongebob[:, :2]
On the surface these techniques seem to produce the same result
np.all(patrick == squidward) # True
but under the hood, these techniques are subtly different. squidward
is a copy of spongebob
while patrick
is a
view of spongebob
.
Observe that if we modify squidward
, spongebob
is unchanged.
squidward[0,0] = 100
print(spongebob)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
By contrast, if we modify patrick
, spongebob
changes too.
patrick[0,0] = 100
print(spongebob)
# [[100 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
Array memory addresses¶
Another way to observe this is by looking up the memory address of the beginning of each array.
spongebob.__array_interface__ # {'data': (105553169745024, False), ...
squidward.__array_interface__ # {'data': (105553142497328, False), ...
patrick.__array_interface__ # {'data': (105553169745024, False), ...
It's clear that spongebob
and patrick
live at the same memory address, but the location of squidward
is elsewhere.
Caution
The memory addresses listed here are the address of the beginning of each array. If two addresses are different, the arrays don't start at the same place, but portions of the arrays may still overlap!
A better technique to check if arrays share memory is to use NumPy's shares_memory()
function.
np.shares_memory(spongebob, squidward) # False
np.shares_memory(spongebob, patrick) # True
How to force copy an array¶
In, general, when you subset an array using nothing but slices, NumPy returns a view of the original array. You
can force numpy to copy the data using the array copy()
method like this.
spongebob[:, :2].copy()
Note
When you subset an array using at least one index array, numpy will automatically copy the data.