a,b,c,d,e,f
1.5,4.8,,6.3
1.60,5.2,6.5,7.2
1.70,5.5,6.6,8.3,5.7
1.80,6.1,6.7,9.7,6.2
1.90,7.1,6.8,11.1,6.7
2,,6.8,12.5,7.3
2.08,,,,7.8
2.1,,7.2
2.2,,8.0
2.3,,8.7
2.4,,9.2,8.2
from pandas import read_csv
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
Traceback (most recent call last):
File "read_lin.py", line 7, in <module>
ds = read_csv ('lin-nan.dat', index_col=0, sep=',')
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 253, in read_csv
return _read(TextParser, filepath_or_buffer, kdict)
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 202, in _read
return parser.get_chunk()
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 844, in get_chunk
alldata = self._rows_to_cols(content)
File "/home/nbecker/.local/lib/python2.7/site-packages/pandas/io/parsers.py", line 809, in _rows_to_cols
raise ValueError(msg)
ValueError: Expecting 6 columns, got 5 in row 1
答案 0 :(得分:1)
您可以使用error_bad_lines=False
功能的read_csv
选项。它会自动跳过格式错误的行并打印出来。
答案 1 :(得分:0)
问题是你没有任何长度为6的列(最长的是5),我不认为 read_csv
中有一个关键字来克服这个问题。
一个解决方案是更明确:
In [1]: df = pd.read_csv('lin-nan.dat', names=list('abcde'), index_col=0, skiprows=1)
In [2]: df['f'] = np.nan
In [3]: df
Out[3]:
b c d e f
a
1.50 4.8 NaN 6.3 NaN NaN
1.60 5.2 6.5 7.2 NaN NaN
1.70 5.5 6.6 8.3 5.7 NaN
1.80 6.1 6.7 9.7 6.2 NaN
1.90 7.1 6.8 11.1 6.7 NaN
2.00 NaN 6.8 12.5 7.3 NaN
2.08 NaN NaN NaN 7.8 NaN
2.10 NaN 7.2 NaN NaN NaN
2.20 NaN 8.0 NaN NaN NaN
2.30 NaN 8.7 NaN NaN NaN
2.40 NaN 9.2 8.2 NaN NaN