Question

我明确地遵循这个例子：

http://wiki.scipy.org/Cookbook/InputOutput#head-d528c8c74e765542c351a768b47c7bc9a2ca8e85

import numpy
def readarray(filename, dtype, separator=','):
   """ Read a file with an arbitrary number of columns.
       The type of data in each column is arbitrary
       It will be cast to the given dtype at runtime
   """
   cast = numpy.cast
   data = [[] for dummy in xrange(len(dtype))]
   for line in open(filename, 'r'):
       fields = line.strip().split(separator)
       for i, number in enumerate(fields):
           data[i].append(number)
   for i in xrange(len(dtype)):
       data[i] = cast[dtype[i]](data[i])
   return numpy.rec.array(data, dtype=dtype)

datadescribe = numpy.dtype([('column1', 'i4'),
                            ('column2', 'i4'),
                            ('column3', 'S'),
                            ('column4', 'S'),
                            ('column5', 'i4'),
                           ])

print readarray("results.csv", datadescribe)

这是results.csv：

22,2,C,G,6
4,1,G,T,7
11,1,G,-,7
23,1,G,T,7

以下是印刷声明：

[(22, 2, '', '', 6)
 (4, 1, '', '', 7)
 (11, 1, '', '', 7)
 (23, 1, '', '', 7)]

我的pylint插件给出了'Module' numpy has no 'cast' member的错误，但是当我打印出来时，我得到了带有dtypes的对象和lambda的结果。演员如何帮助我为这些numpy数组设置'标题'，然后如何使用属性访问字段？我也错过了我的字符串列。这可能是由于dtype语句不正确吗？我最近学习了Python，并开始学习项目的numpy，任何见解都会很感激！

Answer 1

给S类型一些长度：

datadescribe = numpy.dtype([('column1', 'i4'),
                        ('column2', 'i4'),
                        ('column3', 'S1'),
                        ('column4', 'S1'),
                        ('column5', 'i4'),
                       ])

产生

[(22, 2, 'C', 'G', 6) (4, 1, 'G', 'T', 7) (11, 1, 'G', '-', 7) (23, 1, 'G', 'T', 7)]

不要担心pylint消息。它可能未与numpy完全集成。可能还有另外一个SO问题。

X = readarray("stack25005105.csv", datadescribe)
print X.dtype
print X.dtype.names
print X['column1']
print X['column3']

产生

[('column1', '<i4'), ('column2', '<i4'), ('column3', 'S1'), ('column4', 'S1'), ('column5', '<i4')]
('column1', 'column2', 'column3', 'column4', 'column5')
[22  4 11 23]
['C' 'G' 'G' 'G']

Numpy返回带有列标题的重新排列＆＃39;并且＆＃39; dtype＆＃39;？

1 个答案: