Question

我在Suse Enterprise Linux 11上的python 2.7.9上使用pandas 0.18。

我有一个包含多个表的文件：

TABLE_A
col1,col2,...,col8
...


TABLE_B
col1,col2,...,col7
...

表A约为7300行，表B约为100行。我首先通过文件来确定每个表的开始/结束位置。然后，我在pandas w / skiprows中使用read_csv（），nrows选项将相应的表读入内存。我使用引擎=＆＃39; c＆＃39;。

我在使用 engine =＆＃39; c＆＃39; 时看到了奇怪的行为。我能够毫无问题地阅读TABLE_A的前4552行。但如果我尝试读取4553行，我会收到错误：

>>> df = pd.read_csv(f,engine='c',skiprows=1,nrows=4552)
>>> df.shape
(4552, 7)

>>> df = pd.read_csv(f,engine='c',skiprows=1,nrows=4553)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/python_pkgs/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 529, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/python_pkgs/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 301, in _read
    return parser.read(nrows)
  File "/python_pkgs/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 763, in read
    ret = self._engine.read(nrows)
  File "/python_pkgs/lib/python2.7/site-packages/pandas-0.18.0-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1213, in read
    data = self._reader.read(nrows)
  File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988)
  File "pandas/parser.pyx", line 800, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8444)
  File "pandas/parser.pyx", line 842, in pandas.parser.TextReader._read_rows (pandas/parser.c:8970)
  File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
  File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 7 fields in line 7421, saw 8

从错误消息看，C解析器似乎继续读取超过指定行的方式，并且遇到了TABLE_B，它只有7列（TABLE_A有8列）。

但是，使用 engine =＆＃39; python＆＃39; 阅读效果正常。

>>> df = pd.read_csv(f,engine='python',skiprows=1,nrows=6000)
>>> df.shape
(6000, 7)
>>>

这是一个错误还是一个功能/限制？也许C解析器通过读取块的方式工作？感谢。

熊猫：带引擎= C问题的read_csv（）（错误或功能？）

0 个答案: