Question

我正在尝试使用scipy.io.savemat（）将字典写入.mat文件，但是当我这样做时，内容会发生变化！

这是我希望分配给字典键的数组＆＃34; Genes＆＃34;：

vectorizeddf.index.values.astype(np.str_)

打印为

array(['44M2.3', 'A0A087WSV2', 'A0A087WT57', ..., 'tert-rmrp_human',
       'tert-terc_human', 'wisp3 varinat'], 
      dtype='<U44')

然后我做

genedict = {"Genes": vectorizeddf.index.values.astype(np.str_), 
         "X": vectorizeddf.values, 
         "ID": vectorizeddf.columns.values.astype(np.str_)}
import scipy.io as sio
sio.savemat("goa_human.mat", genedict)

但是当我使用

加载字典时

goadict = sio.loadmat("goa_human.mat")

我的字符串用空格填充！

>>> goadict['Genes']
array(['44M2.3                                      ',
   'A0A087WSV2                                  ',
   'A0A087WT57                                  ', ...,
   'tert-rmrp_human                             ',
   'tert-terc_human                             ',
   'wisp3 varinat                               '], 
  dtype='<U44')

这远非理想。另一方面，当我访问

genedict['ID']

我得到了

array(['GO:0000002', 'GO:0000003', 'GO:0000009', ..., 'GO:2001303',
       'GO:2001306', 'GO:2001311'], 
     dtype='<U10')

保存前数组的原始格式。在我看来，问题出在dtype中，但我尽力将它们都作为字符串。我不确定为什么一个是<U44而另一个是<U10。我怎么解决这个问题？

谢谢！

Answer 1

让我们尝试保存各种对象：

io

这是针对scipy版本的，＆＃39; 0.14.1＆＃39;不是一个特别新的，但我还没有读到这个octave:14> data = load('test.mat') data = scalar structure containing the fields: alist = one two three four adict = scalar structure containing the fields: one = 0 1 2 3 4 objs = { [1,1] = one [1,2] = two [1,3] = three [1,4] = four } strs = one two three four代码的最新变化。

在Octave，我得到了：

list

str和(4,5)数组都在Octave中生成dtype=object个字符数组，而d数组生成字符串的单元格数组。

在dd和strs中，U5数组为dd，占用80个字节（4个字* 5个字符/字* 4个字节/字符），但是在In [617]: d['strs'][0] Out[617]: 'one' In [618]: dd['strs'][0] Out[618]: 'one ' In [619]: d['strs'][0].tostring() Out[619]: b'o\x00\x00\x00n\x00\x00\x00e\x00\x00\x00' In [620]: dd['strs'][0].tostring() Out[620]: b'o\x00\x00\x00n\x00\x00\x00e\x00\x00\x00 \x00\x00\x00 \x00\x00\x00'中，字符串用空格填充。

d['strs']

我还没有注意为什么像io.savemat这样的数组不会显示带填充的字符串。一些如何区分空白和空白＆＃39;字节。请注意，这是使用Py3，其中默认字符串是unicode。我不知道Py2字节字符串是否不同（除了它们占用1个字节/字符）。

所以是的，dtype确实通过将空白添加到完整octave:25> cellstr(data.strs) ans = { [1,1] = one [2,1] = two [3,1] = three [4,1] = four宽度来更改字符串数组（和列表）。目的是创建一个MATLAB样式的字符矩阵。

@ zeeMonkeez的链接涵盖了这一点，包括将字符矩阵转换为单元格的方法：

Photo.new.init_albums_comments(group_id: @group.id, owner_id: @group.owner_id, user_id: current_user.id)

Python to MATLAB: exporting list of strings using scipy.io

使用scipy.io编写和读取.mat会更改字典内容

1 个答案: