由python读取的复杂Matlab结构mat文件

时间:2018-02-25 06:15:42

标签: python matlab scipy mat

我知道mat文件的版本问题,这些文件对应于python中的不同加载模块,即scipy.ioh5py。我还搜索了很多类似的问题,例如scipy.io.loadmat nested structures (i.e. dictionaries)How to preserve Matlab struct when accessing in python?。但是当涉及到更复杂的mat文件时,它们都会失败。我的anno_bbox.mat文件结构如下所示:

前两个级别:

anno_bbox

bbox_test

大小:

size

在海里:

hoi

在hoi bboxhuman:

bboxhuman

当我使用spio.loadmat('anno_bbox.mat', struct_as_record=False, squeeze_me=True)时,它只能将第一级信息作为字典获取。

>>> anno_bbox.keys()
dict_keys(['__header__', '__version__', '__globals__', 'bbox_test', 
'bbox_train', 'list_action'])
>>> bbox_test = anno_bbox['bbox_test']
>>> bbox_test.keys()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'keys'
>>> bbox_test
array([<scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab128>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab2b0>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8660ab710>,
   ...,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622ec4a8>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622ecb00>,
   <scipy.io.matlab.mio5_params.mat_struct object at 0x7fa8622f1198>], dtype=object)

我不知道接下来该做什么。这对我来说太复杂了。该文件位于anno_bbox.mat(8.7MB)

2 个答案:

答案 0 :(得分:3)

我得到了(在共享文件中工作对这种情况是个好主意):

正在加载:

data = io.loadmat('../Downloads/anno_bbox.mat')

我明白了:

In [96]: data['bbox_test'].dtype
Out[96]: dtype([('filename', 'O'), ('size', 'O'), ('hoi', 'O')])
In [97]: data['bbox_test'].shape
Out[97]: (1, 9658)

我本可以分配bbox_test=data['bbox_test']。这个变量有9658条记录,有三个字段,每个字段都有一个对象dtype。

所以有一个文件名(嵌入在1个元素数组中的字符串)

In [101]: data['bbox_test'][0,0]['filename']
Out[101]: array(['HICO_test2015_00000001.jpg'], dtype='<U26')

size有3个字段,其中3个数字嵌入数组(2d matlab矩阵):

In [102]: data['bbox_test'][0,0]['size']
Out[102]: 
array([[(array([[640]], dtype=uint16), array([[427]], dtype=uint16), array([[3]], dtype=uint8))]],
      dtype=[('width', 'O'), ('height', 'O'), ('depth', 'O')])
In [112]: data['bbox_test'][0,0]['size'][0,0].item()
Out[112]: 
(array([[640]], dtype=uint16),
 array([[427]], dtype=uint16),
 array([[3]], dtype=uint8))

hoi更复杂:

In [103]: data['bbox_test'][0,0]['hoi']
Out[103]: 
array([[(array([[246]], dtype=uint8), array([[(array([[320]], dtype=uint16), array([[359]], dtype=uint16), array([[306]], dtype=uint16), array([[349]], dtype=uint16)),...
      dtype=[('id', 'O'), ('bboxhuman', 'O'), ('bboxobject', 'O'), ('connection', 'O'), ('invis', 'O')])


In [126]: data['bbox_test'][0,1]['hoi']['id']
Out[126]: 
array([[array([[132]], dtype=uint8), array([[140]], dtype=uint8),
        array([[144]], dtype=uint8)]], dtype=object)
In [130]: data['bbox_test'][0,1]['hoi']['bboxhuman'][0,0]
Out[130]: 
array([[(array([[226]], dtype=uint8), array([[340]], dtype=uint16), array([[18]], dtype=uint8), array([[210]], dtype=uint8))]],
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')])

因此,您在MATLAB结构中显示的数据全部存在于阵列的嵌套结构中(通常为2d(1,1)形状),对象dtype或多个字段。

回过头来加载squeeze_me我会更简单:

In [133]: data['bbox_test'][1]['hoi']['bboxhuman']
Out[133]: 
array([array((226, 340, 18, 210),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')]),
       array((230, 356, 19, 212),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')]),
       array((234, 342, 13, 202),
      dtype=[('x1', 'O'), ('x2', 'O'), ('y1', 'O'), ('y2', 'O')])],
      dtype=object)

使用struct_as_record='False',我得到了

In [136]: data['bbox_test'][1]
Out[136]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9748>

查看此rec的属性,我看到我可以访问&#39;字段&#39;按属性名称:

In [137]: rec = data['bbox_test'][1]
In [138]: rec.filename
Out[138]: 'HICO_test2015_00000002.jpg'
In [139]: rec.size
Out[139]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9b38>

In [141]: rec.size.width
Out[141]: 640
In [142]: rec.hoi
Out[142]: 
array([<scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841e9be0>,
       <scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841e9e10>,
       <scipy.io.matlab.mio5_params.mat_struct object at 0x7f90841ee0b8>],
      dtype=object)

In [145]: rec.hoi[1].bboxhuman
Out[145]: <scipy.io.matlab.mio5_params.mat_struct at 0x7f90841e9f98>
In [146]: rec.hoi[1].bboxhuman.x1
Out[146]: 230

In [147]: vars(rec.hoi[1].bboxhuman)
Out[147]: 
{'_fieldnames': ['x1', 'x2', 'y1', 'y2'],
 'x1': 230,
 'x2': 356,
 'y1': 19,
 'y2': 212}

等等。

答案 1 :(得分:1)

我对以下位置的答案进行了更改:https://stackoverflow.com/a/29126361/12899265

from scipy.io import loadmat, matlab
def load_mat(filename):
    """
    This function should be called instead of direct scipy.io.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    """

    def _check_vars(d):
        """
        Checks if entries in dictionary are mat-objects. If yes
        todict is called to change them to nested dictionaries
        """
        for key in d:
            if isinstance(d[key], matlab.mio5_params.mat_struct):
                d[key] = _todict(d[key])
            elif isinstance(d[key], np.ndarray):
                d[key] = _toarray(d[key])
        return d

    def _todict(matobj):
        """
        A recursive function which constructs from matobjects nested dictionaries
        """
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, matlab.mio5_params.mat_struct):
                d[strg] = _todict(elem)
            elif isinstance(elem, np.ndarray):
                d[strg] = _toarray(elem)
            else:
                d[strg] = elem
        return d

    def _toarray(ndarray):
        """
        A recursive function which constructs ndarray from cellarrays
        (which are loaded as numpy ndarrays), recursing into the elements
        if they contain matobjects.
        """
        if ndarray.dtype != 'float64':
            elem_list = []
            for sub_elem in ndarray:
                if isinstance(sub_elem, matlab.mio5_params.mat_struct):
                    elem_list.append(_todict(sub_elem))
                elif isinstance(sub_elem, np.ndarray):
                    elem_list.append(_toarray(sub_elem))
                else:
                    elem_list.append(sub_elem)
            return np.array(elem_list)
        else:
            return ndarray

    data = loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_vars(data)

如果它是具有结构的矩阵/单元,它可以遍历变量,并且通过不经过没有结构的矩阵也可以使其更快。