Crop different portions of image from 4D array for data augmentation

时间:2018-01-23 19:37:00

标签: python numpy image-processing multidimensional-array computer-vision

I have the following batched RGB image array (4D array):

In [55]: img_arr = np.random.randint(0, 255, (10000, 32, 32, 3))

Now, I want to crop only certain dimensions, say (12x12), from the top left corner across all 3 channels, and preferably along the batch dimension (i.e. axis 0) as well, all in one go. My idea was to produce a grid and just slice it. So, I have constructed this grid:

In [56]: grid = np.c_[np.arange(12)]+ np.r_[np.arange(12)]

In [57]: grid.shape
Out[57]: (12, 12)

But, when I slice the array, I get something which is unexpected:

In [58]: img_arr[:, grid, :].shape
Out[58]: (10000, 12, 12, 32, 3)

I expected and need the result to be of shape (10000, 12, 12, 3) but I don't know where the 32 is coming from.

This is just an example. Ideally, I want to do this cropping at 10 different positions on the image viz. top-left, top-right, bottom-left, bottom-right etc.,

But, by first doing top-left cropping, the rest should be intuitive.

Additionally, as you can see I need to store more than 100K images along the batch dimension in a single 4D array, so it'd be very nice to have a view when doing such random croppings, since it will be memory efficient.

1 个答案:

答案 0 :(得分:3)

We can use slicing for this: we can specify a range for the second and third dimension like:

sub_img = img_arr[:, :12, :12 , :]

Then sub_img.shape == (10000, 12, 12, 3). Here we thus specify a range of 0 to 12 (but we do not need to explicitly state 0). We do this for the second and third dimension. It is also quite declarative: we construct a sub_img where the first index takes all (:), the second one up to the twelveth item (:12), etc.

Note that we do not need to specify tailing :s, we can also write:

sub_img = img_arr[:, :12, :12]  # no last ":"

Slices as views

Note that we here construct a view, we do not copy the array, we only construct a view. So if we make changes in img_arr that are in the range of the view, we will be able to see that in sub_img, and vice versa. In case you need a copy, you can pass for instance the view through the array constructor:

sub_img = np.array(img_arr[:, :12, :12])  # making a copy, instead of a view

Using a view can be beneficial however since it uses almost no memory to store a view (here approximately 144 bytes, whereas a copy will require approximately 34 megabytes), and furthermore the construction of a view is almost instantly (usually it scales with the number of dimensions), whereas a copy will scale with the number of elements.

Arbitrary number of dimensions using slice objects

In case the number of dimensions is arbitrary for instance, we can also pass a tuple of slice objects.

For instance the first expression is equivalent to:

# equivalent to the first code fragment
indices = (slice(None), slice(12), slice(12))
sub_img = img_arr[indices]

So in case the number of dimensions is arbitrary, we can first construct such tuple. A tuple that will slice all dimensions to 12 except the first and the last is for instance:

# generalized with arbitrary number of dimensions
indices = (slice(None), *(slice(12) for _ in range(img_arr.ndim - 2)))
sub_img = img_arr[indices]