Question

我的数据：

a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2

from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')

Traceback (most recent call last):
  File "read_lin.py", line 7, in <module>
    ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
    return _read(TextParser, filepath_or_buffer, kdict)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
    return parser.get_chunk()
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
    alldata = self._rows_to_cols(content)
  File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1

Answer 1

您可以使用error_bad_lines=False功能的read_csv选项。它会自动跳过格式错误的行并打印出来。

Answer 2

问题是你没有任何长度为6的列（最长的是5），我不认为 read_csv中有一个关键字来克服这个问题。

一个解决方案是更明确：

In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)

In [2]: df['f'] = np.nan

In [3]: df
Out[3]: 
        b    c     d    e   f
a                            
1.50  4.8  NaN   6.3  NaN NaN
1.60  5.2  6.5   7.2  NaN NaN
1.70  5.5  6.6   8.3  5.7 NaN
1.80  6.1  6.7   9.7  6.2 NaN
1.90  7.1  6.8  11.1  6.7 NaN
2.00  NaN  6.8  12.5  7.3 NaN
2.08  NaN  NaN   NaN  7.8 NaN
2.10  NaN  7.2   NaN  NaN NaN
2.20  NaN  8.0   NaN  NaN NaN
2.30  NaN  8.7   NaN  NaN NaN
2.40  NaN  9.2   8.2  NaN NaN

pandas read_csv中缺少数据

我的数据：

2 个答案: