我正在研究一个机器学习项目,我应该读一个csv文件来建立一个线性回归模型,这里我读的是csv文件
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0)
但是当我跑步时我得到了这个错误
/usr/bin/python2.7 /home/halawa/PycharmProjects/ML/evergreen.py
Traceback (most recent call last):
File "/home/halawa/PycharmProjects/ML/evergreen.py", line 24, in <module>
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 470, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 256, in _read
return parser.read()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 715, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1164, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 758, in pandas.parser.TextReader.read (pandas/parser.c:7411)
File "pandas/parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7651)
File "pandas/parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas/parser.c:8268)
File "pandas/parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8142)
File "pandas/parser.pyx", line 1758, in pandas.parser.raise_parser_error (pandas/parser.c:20728)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 8
Process finished with exit code 1
答案 0 :(得分:1)
您的问题是您的CSV在每一行上没有一致的字段数。例如,看起来第一行有3个字段
x,y,z
虽然第三行有8个
x,y,z,a,b,c,d,e
您需要修复源CSV文件以避免此错误。
或者,如果您知道最多有8个字段,并且某些行缺少字段,则可以使用names
:
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0, names=list('abcdefgh'))
此参数告诉CSV读取器预期有多少字段,其余字段用默认值填充。
修改强>
如果您的空列标有?
,那么您应该像这样设置pandas na_values
参数:
data_test = pd.read_csv("/media/halawa/93B77F681EC1B4D2/GUC/Semster 8/CSEN 1022 Machine Learning/2/test.csv",delimiter=",", header=0, na_values=['?'])