scipy.io.loadmat嵌套结构(即字典)

时间:2011-08-10 09:32:09

标签: python nested structure scipy dictionary

使用给定的例程(如何使用scipy加载Matlab .mat文件),我无法访问更深层次的嵌套结构以将它们恢复为字典

为了更详细地介绍我遇到的问题,我给出了以下玩具示例:

load scipy.io as spio
a = {'b':{'c':{'d': 3}}}
# my dictionary: a['b']['c']['d'] = 3
spio.savemat('xy.mat',a)

现在我想把mat-File读回到python中。我尝试了以下方法:

vig=spio.loadmat('xy.mat',squeeze_me=True)

如果我现在想要访问我得到的字段:

>> vig['b']
array(((array(3),),), dtype=[('c', '|O8')])
>> vig['b']['c']
array(array((3,), dtype=[('d', '|O8')]), dtype=object)
>> vig['b']['c']['d']
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/<ipython console> in <module>()

ValueError: field named d not found.

但是,通过使用选项struct_as_record=False,可以访问该字段:

v=spio.loadmat('xy.mat',squeeze_me=True,struct_as_record=False)

现在可以通过

访问它了
>> v['b'].c.d
array(3)

5 个答案:

答案 0 :(得分:43)

这是函数,重构字典只是使用这个loadmat而不是scipy.io的loadmat:

import scipy.io as spio

def loadmat(filename):
    '''
    this function should be called instead of direct spio.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    '''
    data = spio.loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_keys(data)

def _check_keys(dict):
    '''
    checks if entries in dictionary are mat-objects. If yes
    todict is called to change them to nested dictionaries
    '''
    for key in dict:
        if isinstance(dict[key], spio.matlab.mio5_params.mat_struct):
            dict[key] = _todict(dict[key])
    return dict        

def _todict(matobj):
    '''
    A recursive function which constructs from matobjects nested dictionaries
    '''
    dict = {}
    for strg in matobj._fieldnames:
        elem = matobj.__dict__[strg]
        if isinstance(elem, spio.matlab.mio5_params.mat_struct):
            dict[strg] = _todict(elem)
        else:
            dict[strg] = elem
    return dict

答案 1 :(得分:16)

只是对mergen的答案的增强,遗憾的是,如果它到达对象的单元格数组,它将停止递归。以下版本将改为列出它们,并在可能的情况下继续递归到单元数组元素中。

import scipy
import numpy as np


def loadmat(filename):
    '''
    this function should be called instead of direct spio.loadmat
    as it cures the problem of not properly recovering python dictionaries
    from mat files. It calls the function check keys to cure all entries
    which are still mat-objects
    '''
    def _check_keys(d):
        '''
        checks if entries in dictionary are mat-objects. If yes
        todict is called to change them to nested dictionaries
        '''
        for key in d:
            if isinstance(d[key], spio.matlab.mio5_params.mat_struct):
                d[key] = _todict(d[key])
        return d

    def _todict(matobj):
        '''
        A recursive function which constructs from matobjects nested dictionaries
        '''
        d = {}
        for strg in matobj._fieldnames:
            elem = matobj.__dict__[strg]
            if isinstance(elem, spio.matlab.mio5_params.mat_struct):
                d[strg] = _todict(elem)
            elif isinstance(elem, np.ndarray):
                d[strg] = _tolist(elem)
            else:
                d[strg] = elem
        return d

    def _tolist(ndarray):
        '''
        A recursive function which constructs lists from cellarrays
        (which are loaded as numpy ndarrays), recursing into the elements
        if they contain matobjects.
        '''
        elem_list = []
        for sub_elem in ndarray:
            if isinstance(sub_elem, spio.matlab.mio5_params.mat_struct):
                elem_list.append(_todict(sub_elem))
            elif isinstance(sub_elem, np.ndarray):
                elem_list.append(_tolist(sub_elem))
            else:
                elem_list.append(sub_elem)
        return elem_list
    data = scipy.io.loadmat(filename, struct_as_record=False, squeeze_me=True)
    return _check_keys(data)

答案 2 :(得分:2)

找到一个解决方案,一个可以访问“scipy.io.matlab.mio5_params.mat_struct对象”的内容可以通过以下方式进行调查:

v['b'].__dict__['c'].__dict__['d']

答案 3 :(得分:2)

我在scipy邮件列表(https://mail.python.org/pipermail/scipy-user/)上被告知有两种方法可以访问这些数据。

这有效:

import scipy.io as spio
vig=spio.loadmat('xy.mat')
print vig['b'][0, 0]['c'][0, 0]['d'][0, 0]

我机器上的输出: 3

这种访问的原因是:“出于历史原因,在Matlab中,一切至少都是2D数组,甚至是标量。” 所以scipy.io.loadmat模仿默认的Matlab行为。

答案 4 :(得分:0)

另一种有效的方法:

import scipy.io as spio
vig=spio.loadmat('xy.mat',squeeze_me=True)
print vig['b']['c'].item()['d']

输出:

3

我也在scipy邮件列表上学到了这个方法。我当然不明白为什么'.item()'必须加入,并且:

print vig['b']['c']['d']

会抛出错误:

IndexError:只有整数,切片(:),省略号(...),numpy.newaxis(None)和整数或布尔数组才是有效索引

但是当我知道它时,我会回来补充解释。 numpy.ndarray.item的解释(来自thenumpy引用): 将数组元素复制到标准Python标量并返回它。

(请注意,这个答案基本上与hpaulj对初始问题的评论相同,但我认为评论不是“可见的”或足够可理解的。当我搜索解决方案时,我当然没有注意到它几周前的第一次)。