我需要转置h5py数据集,以便将3D数组作为一叠2D图像进行访问。
我希望能够在3个可能的方向中切割3D体积,同时将第一个维度保留为图像索引。
我不想将我的数据集投射到一个numpy数组中,以避免在只需要显示某些图像的情况下从磁盘读取整个数据集。
答案 0 :(得分:1)
这是一个使用代理对象的解决方案,该代理对象在数据集的__getitem__
方法之上添加一个图层以考虑转置。它应该适用于任何数量的维度,但仅在3D中进行了广泛的测试。
示例:
my_3D_dataset_201_transposition = TransposedDatasetView(
my_3D_dataset,
transposition=(2, 0, 1))
assert my_3D_dataset[i, j, k] == my_3D_dataset_201_transposition[k, i, j]
我的课程定义如下:
class TransposedDatasetView(object):
"""
This class provides a way to transpose a dataset without
casting it into a numpy array. This way, the dataset in a file need not
necessarily be integrally read into memory to view it in a different
transposition.
.. note::
The performances depend a lot on the way the dataset was written
to file. Depending on the chunking strategy, reading a complete 2D slice
in an unfavorable direction may still require the entire dataset to
be read from disk.
:param dataset: h5py dataset
:param transposition: List of dimension numbers in the wanted order
"""
def __init__(self, dataset, transposition=None):
"""
"""
super(TransposedDatasetView, self).__init__()
self.dataset = dataset
"""original dataset"""
self.shape = dataset.shape
"""Tuple of array dimensions"""
self.dtype = dataset.dtype
"""Data-type of the array’s element"""
self.ndim = len(dataset.shape)
"""Number of array dimensions"""
size = 0
if self.ndim:
size = 1
for dimsize in self.shape:
size *= dimsize
self.size = size
"""Number of elements in the array."""
self.transposition = list(range(self.ndim))
"""List of dimension indices, in an order depending on the
specified transposition. By default this is simply
[0, ..., self.ndim], but it can be changed by specifying a different
`transposition` parameter at initialization.
Use :meth:`transpose`, to create a new :class:`TransposedDatasetView`
with a different :attr:`transposition`.
"""
if transposition is not None:
assert len(transposition) == self.ndim
assert set(transposition) == set(list(range(self.ndim))), \
"Transposition must be a list containing all dimensions"
self.transposition = transposition
self.__sort_shape()
def __sort_shape(self):
"""Sort shape in the order defined in :attr:`transposition`
"""
new_shape = tuple(self.shape[dim] for dim in self.transposition)
self.shape = new_shape
def __sort_indices(self, indices):
"""Return array indices sorted in the order needed
to access data in the original non-transposed dataset.
:param indices: Tuple of ndim indices, in the order needed
to access the view
:return: Sorted tuple of indices, to access original data
"""
assert len(indices) == self.ndim
sorted_indices = tuple(idx for (_, idx) in
sorted(zip(self.transposition, indices)))
return sorted_indices
def __getitem__(self, item):
"""Handle fancy indexing with regards to the dimension order as
specified in :attr:`transposition`
The supported fancy-indexing syntax is explained at
http://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing.
Additional restrictions exist if the data has been transposed:
- numpy boolean array indexing is not supported
- ellipsis objects are not supported
:param item: Index, possibly fancy index (must be supported by h5py)
:return:
"""
# no transposition, let the original dataset handle indexing
if self.transposition == list(range(self.ndim)):
return self.dataset[item]
# 1-D slicing -> n-D slicing (n=1)
if not hasattr(item, "__len__"):
# first dimension index is given
item = [item]
# following dimensions are indexed with : (all elements)
item += [slice(0, sys.maxint, 1) for _i in range(self.ndim - 1)]
# n-dimensional slicing
if len(item) != self.ndim:
raise IndexError(
"N-dim slicing requires a tuple of N indices/slices. " +
"Needed dimensions: %d" % self.ndim)
# get list of indices sorted in the original dataset order
sorted_indices = self.__sort_indices(item)
output_data_not_transposed = self.dataset[sorted_indices]
# now we must transpose the output data
output_dimensions = []
frozen_dimensions = []
for i, idx in enumerate(item):
# slices and sequences
if not isinstance(idx, int):
output_dimensions.append(self.transposition[i])
# regular integer index
else:
# whenever a dimension is fixed (indexed by an integer)
# the number of output dimension is reduced
frozen_dimensions.append(self.transposition[i])
# decrement output dimensions that are above frozen dimensions
for frozen_dim in reversed(sorted(frozen_dimensions)):
for i, out_dim in enumerate(output_dimensions):
if out_dim > frozen_dim:
output_dimensions[i] -= 1
assert (len(output_dimensions) + len(frozen_dimensions)) == self.ndim
assert set(output_dimensions) == set(range(len(output_dimensions)))
return numpy.transpose(output_data_not_transposed,
axes=output_dimensions)
def __array__(self, dtype=None):
"""Cast the dataset into a numpy array, and return it.
If a transposition has been done on this dataset, return
a transposed view of a numpy array."""
return numpy.transpose(numpy.array(self.dataset, dtype=dtype),
self.transposition)
def transpose(self, transposition=None):
"""Return a re-ordered (dimensions permutated)
:class:`TransposedDatasetView`.
The returned object refers to
the same dataset but with a different :attr:`transposition`.
:param list[int] transposition: List of dimension numbers in the wanted order
:return: Transposed TransposedDatasetView
"""
# by default, reverse the dimensions
if transposition is None:
transposition = list(reversed(self.transposition))
return TransposedDatasetView(self.dataset,
transposition)
答案 1 :(得分:1)
有一个类似的问题,并且在很大程度上受到PiRK的响应的启发,我的团队为此分发了可点子安装的软件包:https://pypi.org/project/lazy-ops/