I have the following batched RGB image array (4D array):
In [55]: img_arr = np.random.randint(0, 255, (10000, 32, 32, 3))
Now, I want to crop only certain dimensions, say (12x12
), from the top left corner across all 3
channels, and preferably along the batch dimension (i.e. axis 0
) as well, all in one go. My idea was to produce a grid and just slice it. So, I have constructed this grid:
In [56]: grid = np.c_[np.arange(12)]+ np.r_[np.arange(12)]
In [57]: grid.shape
Out[57]: (12, 12)
But, when I slice the array, I get something which is unexpected:
In [58]: img_arr[:, grid, :].shape
Out[58]: (10000, 12, 12, 32, 3)
I expected and need the result to be of shape (10000, 12, 12, 3)
but I don't know where the 32
is coming from.
This is just an example. Ideally, I want to do this cropping at 10 different positions on the image viz. top-left, top-right, bottom-left, bottom-right etc.,
But, by first doing top-left cropping, the rest should be intuitive.
Additionally, as you can see I need to store more than 100K images along the batch dimension in a single 4D array, so it'd be very nice to have a view when doing such random croppings, since it will be memory efficient.
答案 0 :(得分:3)
We can use slicing for this: we can specify a range for the second and third dimension like:
sub_img = img_arr[:, :12, :12 , :]
Then sub_img.shape == (10000, 12, 12, 3)
. Here we thus specify a range of 0
to 12
(but we do not need to explicitly state 0
). We do this for the second and third dimension. It is also quite declarative: we construct a sub_img
where the first index takes all (:
), the second one up to the twelveth item (:12
), etc.
Note that we do not need to specify tailing :
s, we can also write:
sub_img = img_arr[:, :12, :12] # no last ":"
Note that we here construct a view, we do not copy the array, we only construct a view. So if we make changes in img_arr
that are in the range of the view, we will be able to see that in sub_img
, and vice versa. In case you need a copy, you can pass for instance the view through the array
constructor:
sub_img = np.array(img_arr[:, :12, :12]) # making a copy, instead of a view
Using a view can be beneficial however since it uses almost no memory to store a view (here approximately 144 bytes, whereas a copy will require approximately 34 megabytes), and furthermore the construction of a view is almost instantly (usually it scales with the number of dimensions), whereas a copy will scale with the number of elements.
slice
objectsIn case the number of dimensions is arbitrary for instance, we can also pass a tuple of slice
objects.
For instance the first expression is equivalent to:
# equivalent to the first code fragment
indices = (slice(None), slice(12), slice(12))
sub_img = img_arr[indices]
So in case the number of dimensions is arbitrary, we can first construct such tuple. A tuple that will slice all dimensions to 12 except the first and the last is for instance:
# generalized with arbitrary number of dimensions
indices = (slice(None), *(slice(12) for _ in range(img_arr.ndim - 2)))
sub_img = img_arr[indices]