Pandas加载文本文件错误:CParserError:错误标记数据

时间:2017-07-08 22:10:33

标签: python pandas text-files parse-error

Pandas加载文本文件错误:CParserError:错误标记数据。

我是一名新的熊猫学习者。我正在尝试使用pandas打开文本文件。我在python中编写代码,然后访问正确的路径并运行python文件,但失败了。

这是原始数据。没有字段名称,所有数据行都用空格分隔:

2017-07-02 23:59:51127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 986 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=hydrogen-motor 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 2539 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=100005713&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 1172 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=stainless-stand 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 3152 31.7.188.55

这是我简单的python代码:

import pandas as pd

DATA_FILE='data.log'
df = pd.read_table(DATA_FILE, sep=" ")

print(df)

但我得到的错误如下:

Traceback (most recent call last):
  File "open.py", line 7, in <module>
    df = pd.read_table(DATA_FILE, sep=" ")
  File "C:\Users\hh\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\hh\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 401, in _read
    data = parser.read()
  File "C:\Users\hh\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 939, in read
    ret = self._engine.read(nrows)
  File "C:\Users\hh\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1508, in read
    data = self._reader.read(nrows)
  File "pandas\parser.pyx", line 848, in pandas.parser.TextReader.read (pandas\parser.c:10415)
  File "pandas\parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:10691)
  File "pandas\parser.pyx", line 924, in pandas.parser.TextReader._read_rows (pandas\parser.c:11437)
  File "pandas\parser.pyx", line 911, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:11308)
  File "pandas\parser.pyx", line 2024, in pandas.parser.raise_parser_error (pandas\parser.c:27037)
pandas.io.common.CParserError: Error tokenizing data. C error: Expected 6 fields in line 4, saw 17

必须有我的python代码运行的东西。如何获得正确的语法代码?

1 个答案:

答案 0 :(得分:0)

你错过了第一行的空间:

2017-07-02 23:59:51127.0.0.1 

替换为:

2017-07-02 23:59:51 127.0.0.1 

刚刚测试过:

In [12]: cat data.log
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=202104&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 986 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=hydrogen-motor 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 2539 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvSearchProduct.aspx cid=100005713&p=&pageindex=&kw=electric-skateboard 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 1172 31.7.188.55
2017-07-02 23:59:51 127.0.0.1 GET /ecvv_product/EcvvHotSearchProduct.aspx kw=stainless-stand 8082 - 127.0.0.1 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;+DigExt;+DTS+Agent - 200 0 0 3152 31.7.188.55

In [13]: dx = pd.read_table('data.log', sep=" ", header=None)

In [14]: dx
Out[14]: 
           0         1          2    3   \
0  2017-07-02  23:59:51  127.0.0.1  GET   
1  2017-07-02  23:59:51  127.0.0.1  GET   
2  2017-07-02  23:59:51  127.0.0.1  GET   
3  2017-07-02  23:59:51  127.0.0.1  GET   

                                        4   \
0     /ecvv_product/EcvvSearchProduct.aspx   
1  /ecvv_product/EcvvHotSearchProduct.aspx   
2     /ecvv_product/EcvvSearchProduct.aspx   
3  /ecvv_product/EcvvHotSearchProduct.aspx   

                                                  5     6  7          8   \
0    cid=202104&p=&pageindex=&kw=electric-skateboard  8082  -  127.0.0.1   
1                                  kw=hydrogen-motor  8082  -  127.0.0.1   
2  cid=100005713&p=&pageindex=&kw=electric-skateb...  8082  -  127.0.0.1   
3                                 kw=stainless-stand  8082  -  127.0.0.1   

                                                  9  10   11  12  13    14  \
0  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0   986   
1  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0  2539   
2  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0  1172   
3  Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+NT;...  -  200   0   0  3152   

            15  
0  31.7.188.55  
1  31.7.188.55  
2  31.7.188.55  
3  31.7.188.55