在不使用numpy数组的情况下转置h5py数据集

时间:2016-12-16 09:29:05

标签: python arrays h5py

我需要转置h5py数据集,以便将3D数组作为一叠2D图像进行访问。

我希望能够在3个可能的方向中切割3D体积,同时将第一个维度保留为图像索引。

我不想将我的数据集投射到一个numpy数组中,以避免在只需要显示某些图像的情况下从磁盘读取整个数据集。

2 个答案:

答案 0 :(得分:1)

这是一个使用代理对象的解决方案,该代理对象在数据集的__getitem__方法之上添加一个图层以考虑转置。它应该适用于任何数量的维度,但仅在3D中进行了广泛的测试。

示例:

my_3D_dataset_201_transposition = TransposedDatasetView(
        my_3D_dataset,
        transposition=(2, 0, 1))
assert my_3D_dataset[i, j, k] == my_3D_dataset_201_transposition[k, i, j]

我的课程定义如下:

class TransposedDatasetView(object):
    """
    This class provides a way to transpose a dataset without
    casting it into a numpy array. This way, the dataset in a file need not
    necessarily be integrally read into memory to view it in a different
    transposition.

    .. note::
        The performances depend a lot on the way the dataset was written
        to file. Depending on the chunking strategy, reading a complete 2D slice
        in an unfavorable direction may still require the entire dataset to
        be read from disk.

    :param dataset: h5py dataset
    :param transposition: List of dimension numbers in the wanted order
    """
    def __init__(self, dataset, transposition=None):
        """

        """
        super(TransposedDatasetView, self).__init__()
        self.dataset = dataset
        """original dataset"""

        self.shape = dataset.shape
        """Tuple of array dimensions"""
        self.dtype = dataset.dtype
        """Data-type of the array’s element"""
        self.ndim = len(dataset.shape)
        """Number of array dimensions"""

        size = 0
        if self.ndim:
            size = 1
            for dimsize in self.shape:
                size *= dimsize
        self.size = size
        """Number of elements in the array."""

        self.transposition = list(range(self.ndim))
        """List of dimension indices, in an order depending on the
        specified transposition. By default this is simply
        [0, ..., self.ndim], but it can be changed by specifying a different
        `transposition` parameter at initialization.

        Use :meth:`transpose`, to create a new :class:`TransposedDatasetView`
        with a different :attr:`transposition`.
        """

        if transposition is not None:
            assert len(transposition) == self.ndim
            assert set(transposition) == set(list(range(self.ndim))), \
                "Transposition must be a list containing all dimensions"
            self.transposition = transposition
            self.__sort_shape()

    def __sort_shape(self):
        """Sort shape in the order defined in :attr:`transposition`
        """
        new_shape = tuple(self.shape[dim] for dim in self.transposition)
        self.shape = new_shape

    def __sort_indices(self, indices):
        """Return array indices sorted in the order needed
        to access data in the original non-transposed dataset.

        :param indices: Tuple of ndim indices, in the order needed
            to access the view
        :return: Sorted tuple of indices, to access original data
        """
        assert len(indices) == self.ndim
        sorted_indices = tuple(idx for (_, idx) in
                               sorted(zip(self.transposition, indices)))
        return sorted_indices

    def __getitem__(self, item):
        """Handle fancy indexing with regards to the dimension order as
        specified in :attr:`transposition`

        The supported fancy-indexing syntax is explained at
        http://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing.

        Additional restrictions exist if the data has been transposed:

            - numpy boolean array indexing is not supported
            - ellipsis objects are not supported

        :param item: Index, possibly fancy index (must be supported by h5py)
        :return:
        """
        # no transposition, let the original dataset handle indexing
        if self.transposition == list(range(self.ndim)):
            return self.dataset[item]

        # 1-D slicing -> n-D slicing (n=1)
        if not hasattr(item, "__len__"):
            # first dimension index is given
            item = [item]
            # following dimensions are indexed with : (all elements)
            item += [slice(0, sys.maxint, 1) for _i in range(self.ndim - 1)]

        # n-dimensional slicing
        if len(item) != self.ndim:
            raise IndexError(
                "N-dim slicing requires a tuple of N indices/slices. " +
                "Needed dimensions: %d" % self.ndim)

        # get list of indices sorted in the original dataset order
        sorted_indices = self.__sort_indices(item)

        output_data_not_transposed = self.dataset[sorted_indices]

        # now we must transpose the output data
        output_dimensions = []
        frozen_dimensions = []
        for i, idx in enumerate(item):
            # slices and sequences
            if not isinstance(idx, int):
                output_dimensions.append(self.transposition[i])
            # regular integer index
            else:
                # whenever a dimension is fixed (indexed by an integer)
                # the number of output dimension is reduced
                frozen_dimensions.append(self.transposition[i])

        # decrement output dimensions that are above frozen dimensions
        for frozen_dim in reversed(sorted(frozen_dimensions)):
            for i, out_dim in enumerate(output_dimensions):
                if out_dim > frozen_dim:
                    output_dimensions[i] -= 1

        assert (len(output_dimensions) + len(frozen_dimensions)) == self.ndim
        assert set(output_dimensions) == set(range(len(output_dimensions)))

        return numpy.transpose(output_data_not_transposed,
                               axes=output_dimensions)

    def __array__(self, dtype=None):
        """Cast the dataset into a numpy array, and return it.

        If a transposition has been done on this dataset, return
        a transposed view of a numpy array."""
        return numpy.transpose(numpy.array(self.dataset, dtype=dtype),
                               self.transposition)

    def transpose(self, transposition=None):
        """Return a re-ordered (dimensions permutated)
        :class:`TransposedDatasetView`.

        The returned object refers to
        the same dataset but with a different :attr:`transposition`.

        :param list[int] transposition: List of dimension numbers in the wanted order
        :return: Transposed TransposedDatasetView
        """
        # by default, reverse the dimensions
        if transposition is None:
            transposition = list(reversed(self.transposition))

        return TransposedDatasetView(self.dataset,
                                     transposition)

答案 1 :(得分:1)

有一个类似的问题,并且在很大程度上受到PiRK的响应的启发,我的团队为此分发了可点子安装的软件包:https://pypi.org/project/lazy-ops/