Numpy返回带有列标题的重新排列'并且' dtype'?

时间:2014-07-28 22:04:05

标签: python arrays numpy

我明确地遵循这个例子:

  

http://wiki.scipy.org/Cookbook/InputOutput#head-d528c8c74e765542c351a768b47c7bc9a2ca8e85

import numpy
def readarray(filename, dtype, separator=','):
   """ Read a file with an arbitrary number of columns.
       The type of data in each column is arbitrary
       It will be cast to the given dtype at runtime
   """
   cast = numpy.cast
   data = [[] for dummy in xrange(len(dtype))]
   for line in open(filename, 'r'):
       fields = line.strip().split(separator)
       for i, number in enumerate(fields):
           data[i].append(number)
   for i in xrange(len(dtype)):
       data[i] = cast[dtype[i]](data[i])
   return numpy.rec.array(data, dtype=dtype)

datadescribe = numpy.dtype([('column1', 'i4'),
                            ('column2', 'i4'),
                            ('column3', 'S'),
                            ('column4', 'S'),
                            ('column5', 'i4'),
                           ])

print readarray("results.csv", datadescribe)

这是results.csv:

22,2,C,G,6
4,1,G,T,7
11,1,G,-,7
23,1,G,T,7

以下是印刷声明:

[(22, 2, '', '', 6)
 (4, 1, '', '', 7)
 (11, 1, '', '', 7)
 (23, 1, '', '', 7)]

我的pylint插件给出了'Module' numpy has no 'cast' member的错误,但是当我打印出来时,我得到了带有dtypes的对象和lambda的结果。演员如何帮助我为这些numpy数组设置'标题',然后如何使用属性访问字段?我也错过了我的字符串列。这可能是由于dtype语句不正确吗?我最近学习了Python,并开始学习项目的numpy,任何见解都会很感激!

1 个答案:

答案 0 :(得分:1)

S类型一些长度:

datadescribe = numpy.dtype([('column1', 'i4'),
                        ('column2', 'i4'),
                        ('column3', 'S1'),
                        ('column4', 'S1'),
                        ('column5', 'i4'),
                       ])

产生

[(22, 2, 'C', 'G', 6) (4, 1, 'G', 'T', 7) (11, 1, 'G', '-', 7) (23, 1, 'G', 'T', 7)]

不要担心pylint消息。它可能未与numpy完全集成。可能还有另外一个SO问题。


X = readarray("stack25005105.csv", datadescribe)
print X.dtype
print X.dtype.names
print X['column1']
print X['column3']

产生

[('column1', '<i4'), ('column2', '<i4'), ('column3', 'S1'), ('column4', 'S1'), ('column5', '<i4')]
('column1', 'column2', 'column3', 'column4', 'column5')
[22  4 11 23]
['C' 'G' 'G' 'G']