我正在使用pandas来读取csv文件。我收到了这个错误:
File "antifraud.py", line 11, in <module>
df = pd.read_csv(trainFilePath, names=['time', 'id1', 'id2', 'amount', 'message'])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 470, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 256, in _read
return parser.read()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 715, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 1164, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 758, in pandas.parser.TextReader.read (pandas/parser.c:7411)
File "pandas/parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7651)
File "pandas/parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas/parser.c:8268)
File "pandas/parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8142)
File "pandas/parser.pyx", line 1758, in pandas.parser.raise_parser_error (pandas/parser.c:20728)
pandas.parser.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
当我尝试使用open csv读取它时:
import csv
with open(filepath, 'r') as f:
reader = csv.reader(f)
linenumber = 1
try:
for row in reader:
linenumber += 1
except Exception as e:
print (("Error line %d: %s %s" % (linenumber, str(type(e)), e.message)))
我看到错误发生在特定的一行。该行是:
2016-11-02 09:45:43, 10244, 26248, 20.06, 提供一天 我充滿
我的问题是文件中的数据是否包含一些转义字符,如'\ r','\ n'或是因为pandas可能无法读取中文,因为我没有提到编码方法,或者是什么别的?
答案 0 :(得分:1)
有报告issue,作为解决方案,可以尝试@ chris-b1提供的以下解决方案:
pd.read_csv(open(trainFilePath,'rU'), encoding='utf-8')