Question

我正在使用pandas来读取csv文件。我收到了这个错误：

File "antifraud.py", line 11, in <module>
    df = pd.read_csv(trainFilePath, names=['time', 'id1', 'id2', 'amount', 'message'])
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 470, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 256, in _read
    return parser.read()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 715, in read
    ret = self._engine.read(nrows)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas/io/parsers.py", line 1164, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 758, in pandas.parser.TextReader.read (pandas/parser.c:7411)
  File "pandas/parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:7651)
  File "pandas/parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas/parser.c:8268)
  File "pandas/parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8142)
  File "pandas/parser.pyx", line 1758, in pandas.parser.raise_parser_error (pandas/parser.c:20728)
pandas.parser.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

当我尝试使用open csv读取它时：

import csv
with open(filepath, 'r') as f:
    reader = csv.reader(f)
    linenumber = 1
    try:
        for row in reader:
            linenumber += 1
    except Exception as e:
        print (("Error line %d: %s %s" % (linenumber, str(type(e)), e.message)))

我看到错误发生在特定的一行。该行是：

2016-11-02 09:45:43, 10244, 26248, 20.06, 提供一天 我充滿

我的问题是文件中的数据是否包含一些转义字符，如'\ r'，'\ n'或是因为pandas可能无法读取中文，因为我没有提到编码方法，或者是什么别的？

Answer 1

有报告issue，作为解决方案，可以尝试@ chris-b1提供的以下解决方案：

pd.read_csv(open(trainFilePath,'rU'), encoding='utf-8')

Pandas解析器CParseError

1 个答案: