我正在尝试将.csv文件加载到数组中。 但是,该文件看起来像这样。
"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422
.................................
.................................
我试图跳过前导字符串。到目前为止,我一直在取消第一排。
a = np.genfromtxt(file,delimiter=',',skiprows=1)
但我想知道是否有办法在处理过程中读入一个忽略字符串的数组。
答案 0 :(得分:2)
您可以使用loadtxt(..., usecols=(1,2,3), ...)
,这可以避免在文件开头跳过一行吗?
usecols参数只是告诉loadtxt要提取哪些列(并且是数字)
# Put data into file (in shell, just me copying the sample)
cat >> /tmp/data.csv
"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422
# In IPython
In [1]: import numpy as np
In [2]: a = np.loadtxt('/tmp/data.csv', usecols=(1,2,3), delimiter=',')
In [3]: a
Out[3]:
array([[ 0.03435345, -1.234556 , -3. ],
[ 1.43567896, -1.45322124, 9.543422 ]])
答案 1 :(得分:0)
因为它只是文件开头的第一行,所以你可以编写一个辅助生成器来删除该字符串:
def helper(filename):
with open(filename) as fin:
# this could get more robust ... e.g. by doing typechecking if necessary.
line = next(fin).split(',')
yield ','.join(line[1:])
for line in fin:
yield line
arr = np.genfromtxt(helper('myfile.csv'), delimiter=',')