Question

我正在尝试将.csv文件加载到数组中。但是，该文件看起来像这样。

"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422
 .................................
 .................................

我试图跳过前导字符串。到目前为止，我一直在取消第一排。

 a = np.genfromtxt(file,delimiter=',',skiprows=1)

但我想知道是否有办法在处理过程中读入一个忽略字符串的数组。

Answer 1

您可以使用loadtxt(..., usecols=(1,2,3), ...)，这可以避免在文件开头跳过一行吗？

usecols参数只是告诉loadtxt要提取哪些列（并且是数字）

# Put data into file (in shell, just me copying the sample)
cat >> /tmp/data.csv
"myfilename",0.034353453,-1.234556,-3,45671234
,1.43567896, -1.45322124, 9.543422

# In IPython
In [1]: import numpy as np

In [2]: a = np.loadtxt('/tmp/data.csv', usecols=(1,2,3), delimiter=',')

In [3]: a
Out[3]: 
array([[ 0.03435345, -1.234556  , -3.        ],
       [ 1.43567896, -1.45322124,  9.543422  ]])

Answer 2

因为它只是文件开头的第一行，所以你可以编写一个辅助生成器来删除该字符串：

def helper(filename):
    with open(filename) as fin:
        # this could get more robust ... e.g. by doing typechecking if necessary.
        line = next(fin).split(',')
        yield ','.join(line[1:])
        for line in fin:
            yield line

arr = np.genfromtxt(helper('myfile.csv'), delimiter=',')

读入数组时忽略字符串

2 个答案: