Question

我有两个np.ndarray：

predictions = np.array([[0.2, 0.9], [0.01, 0.0], [0.3, 0.8], ...])
filenames = np.array(["file1", "file2", "file3", ...])

文件名中的每个文件对应于预测中的每个数组：

file1 ==＆gt; [0.2,0.9]

file2 ==＆gt; [0.01,0.0]

file3的==＆GT; [0.3,0.8] ...

我想将这两个数组中的值打印成csv文件，如下所示：

fileName        label1      label2
file1           0.2         0.9 
file2           0.1         0.0
file3           0.3         0.8

我希望使用np.stack将这两个np.array合并为一个数据结构，然后使用np.savetext（path，array，）输出到csv文件。

但是np.stack（array，axis = 1）似乎只接受两个具有相同形状的数组。有没有办法让堆栈适用于这种情况？

Answer 1

使用numpy.expand_dims和numpy.hstask例程的解决方案：

import numpy as np
result = np.hstack((np.expand_dims(filenames, axis=1), predictions))

# saving to csv file using `np.savetxt`:
with open('./text_files/predictions.csv', 'wb') as fh:
    np.savetxt(fh, X= result, header='fileName\tlabel1\tlabel2', delimiter='\t', fmt='%-8s\t%-6s\t%-6s')

predictions.csv（测试文件）内容：

# fileName  label1  label2
file1       0.2     0.9   
file2       0.01    0.0   
file3       0.3     0.8

Answer 2

这是zip的一种方式：

>>> np.array(zip(filenames, *zip(*predictions)))
array([['file1', '0.2', '0.9'],
       ['file2', '0.01', '0.0'],
       ['file3', '0.3', '0.8']], 
      dtype='|S5')

另一个np.vstack：

>>> np.vstack((filenames, predictions.T)).T
array([['file1', '0.2', '0.9'],
       ['file2', '0.01', '0.0'],
       ['file3', '0.3', '0.8']], 
      dtype='|S5')

Answer 3

您可以向文件名添加其他维度，然后使用net = feedforwardnet(3,'traingda'); net.trainParam.lr = 0.05; % setting the learning rate to 5% net.trainParam.epochs = 2000 % setting number of epochs将其与预测堆叠起来：

hstack()

Answer 4

你有2个数组，一个是带数字的2d，另一个带有字符串的1d

In [53]: predictions = np.array([[0.2, 0.9], [0.01, 0.0], [0.3, 0.8]])
    ...: filenames = np.array(["file1", "file2", "file3"])

In [54]: predictions
Out[54]: 
array([[ 0.2 ,  0.9 ],
       [ 0.01,  0.  ],
       [ 0.3 ,  0.8 ]])
In [55]: filenames
Out[55]: 
array(['file1', 'file2', 'file3'], 
      dtype='<U5')

如果向filenames添加尺寸（因此它变为（3,1）），则可以将其与另一个尺寸连接 - 请注意轴。我正在使用Py3，所以我的默认字符串类型是unicode（U5）。

In [56]: arr = np.concatenate((filenames[:,None], predictions),axis=1)
In [57]: arr
Out[57]: 
array([['file1', '0.2', '0.9'],
       ['file2', '0.01', '0.0'],
       ['file3', '0.3', '0.8']], 
      dtype='<U32')

请注意，结果是字符串类型。哪个可能没关系。 column_stack和vstack也可以使用，但它们最终会调整维度，并使用连接，就像我一样。

np.stack在新维度上加入数组。我不认为你想要一个3D阵列。

In [58]: np.savetxt('test', arr, fmt='%10s')
In [59]: cat test
     file1        0.2        0.9
     file2       0.01        0.0
     file3        0.3        0.8

您可以调整fmt，但是如果使用字符串，您会遇到%s上的某些变体。 savetxt也允许页眉和页脚。

要对fmt进行更多控制，例如小数位数等，我们必须构造一个结构化数组，一个将字符串字段与2个浮点字段混合的数组。如果需要，我可以扩展它。

另一种选择是仅zip数组和写行。在编写文本文件时，savetxt没有做任何神奇的事情。

In [65]: for f, n in zip(filenames, predictions):
    ...:     print('%s  %s'%(f, '%10.2f %10.2f'%tuple(n)))
    ...:     
file1        0.20       0.90
file2        0.01       0.00
file3        0.30       0.80

考虑到从1列字符串和2列浮点数组创建结构化数组的复杂性，最后zip方法可能是最简单的。

结构化数组

In [114]: arr = np.zeros((3,),np.dtype('U10,f,f'))
In [115]: arr['f0']=filenames
In [116]: arr['f1']=predictions[:,0]
In [117]: arr['f2']=predictions[:,1]
In [118]: np.savetxt('test',arr, fmt='%10s %10.2f %10.1f')
In [119]: cat test
     file1       0.20        0.9
     file2       0.01        0.0
     file3       0.30        0.8

构造此数组的一种更简单的方法是：

arr = np.rec.fromarrays((filenames, predictions[:,0], predictions[:,1]))

我更喜欢制作这样的结构化数组：

In [123]: dt=np.dtype([('files', 'U10'), ('pred', 'float64', (2,))])
In [124]: dt
Out[124]: dtype([('files', '<U10'), ('pred', '<f8', (2,))])
In [125]: arr = np.zeros((3,),dtype=dt)
In [126]: arr['files']=filenames
In [127]: arr['pred']=predictions
In [128]: arr
Out[128]: 
array([('file1', [0.2, 0.9]), ('file2', [0.01, 0.0]), ('file3', [0.3, 0.8])], 
      dtype=[('files', '<U10'), ('pred', '<f8', (2,))])

但是np.savetxt无法处理该复合dtype。所以我不得不求助于将预测放在不同的领域。

pandas可以更好地编写带行标签的文件。

在这种情况下如何使用np.stack？

4 个答案:

结构化数组