我明确地遵循这个例子:
http://wiki.scipy.org/Cookbook/InputOutput#head-d528c8c74e765542c351a768b47c7bc9a2ca8e85
import numpy
def readarray(filename, dtype, separator=','):
""" Read a file with an arbitrary number of columns.
The type of data in each column is arbitrary
It will be cast to the given dtype at runtime
"""
cast = numpy.cast
data = [[] for dummy in xrange(len(dtype))]
for line in open(filename, 'r'):
fields = line.strip().split(separator)
for i, number in enumerate(fields):
data[i].append(number)
for i in xrange(len(dtype)):
data[i] = cast[dtype[i]](data[i])
return numpy.rec.array(data, dtype=dtype)
datadescribe = numpy.dtype([('column1', 'i4'),
('column2', 'i4'),
('column3', 'S'),
('column4', 'S'),
('column5', 'i4'),
])
print readarray("results.csv", datadescribe)
这是results.csv:
22,2,C,G,6
4,1,G,T,7
11,1,G,-,7
23,1,G,T,7
以下是印刷声明:
[(22, 2, '', '', 6)
(4, 1, '', '', 7)
(11, 1, '', '', 7)
(23, 1, '', '', 7)]
我的pylint插件给出了'Module' numpy has no 'cast' member
的错误,但是当我打印出来时,我得到了带有dtypes的对象和lambda的结果。演员如何帮助我为这些numpy数组设置'标题',然后如何使用属性访问字段?我也错过了我的字符串列。这可能是由于dtype语句不正确吗?我最近学习了Python,并开始学习项目的numpy,任何见解都会很感激!
答案 0 :(得分:1)
给S
类型一些长度:
datadescribe = numpy.dtype([('column1', 'i4'),
('column2', 'i4'),
('column3', 'S1'),
('column4', 'S1'),
('column5', 'i4'),
])
产生
[(22, 2, 'C', 'G', 6) (4, 1, 'G', 'T', 7) (11, 1, 'G', '-', 7) (23, 1, 'G', 'T', 7)]
不要担心pylint
消息。它可能未与numpy
完全集成。可能还有另外一个SO问题。
X = readarray("stack25005105.csv", datadescribe)
print X.dtype
print X.dtype.names
print X['column1']
print X['column3']
产生
[('column1', '<i4'), ('column2', '<i4'), ('column3', 'S1'), ('column4', 'S1'), ('column5', '<i4')]
('column1', 'column2', 'column3', 'column4', 'column5')
[22 4 11 23]
['C' 'G' 'G' 'G']