您好我正在尝试使用jupyter,我安装了pandas,python和jupyter, 为了检查一切是否正常,我尝试使用pandas打开一个txt文件,如下所示:
import pandas as pd
df=pd.read_csv("/authorprof/res_es.txt", sep=" ", header = None)
txt文件如下所示:
Running testing authorid
Running training authorprof
[[325 301]
[236 191]
[294 274]
[354 357]
[237 241]
[344 335]
[419 401]
[312 286]
[209 206]
但是我得到以下例外:
-----------------------------------------------------------------------
CParserError Traceback (most recent call last)
<ipython-input-26-c970702c41ed> in <module>()
3 print(sys.version)
4 print(pd.__version__)
----> 5 df=pd.read_csv("/authorprof/res_es.txt", sep=" ", header = None)
6
7
/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
560 skip_blank_lines=skip_blank_lines)
561
--> 562 return _read(filepath_or_buffer, kwds)
563
564 parser_f.__name__ = name
/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
323 return parser
324
--> 325 return parser.read()
326
327 _parser_defaults = {
/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
813 raise ValueError('skip_footer not supported for iteration')
814
--> 815 ret = self._engine.read(nrows)
816
817 if self.options.get('as_recarray'):
/home/neo/anaconda3/lib/python3.5/site-packages/pandas/io/parsers.py in read(self, nrows)
1312 def read(self, nrows=None):
1313 try:
-> 1314 data = self._reader.read(nrows)
1315 except StopIteration:
1316 if self._first_chunk:
pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:8748)()
pandas/parser.pyx in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)()
pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)()
pandas/parser.pyx in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)()
pandas/parser.pyx in pandas.parser.raise_parser_error (pandas/parser.c:23325)()
CParserError: Error tokenizing data. C error: Expected 3 fields in line 71, saw 5
我希望有人告诉我为什么?,我得到了那个例外,我是初学者使用jupyter笔记本,我相信也许是一种bug我打印我的python版本和pandas版本来添加更多细节:
3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
0.18.1
答案 0 :(得分:3)
我认为您需要参数skiprows
来省略txt
中的前两行:
df=pd.read_csv("/authorprof/res_es.txt", sep="s\+", header = None, skiprows=2)
样品:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
temp=u"""Running testing authorid
Running training authorprof
[[325 301]
[236 191]
[294 274]
[354 357]
[237 241]
[344 335]
[419 401]
[312 286]
[209 206]"""
#after testing replace StringIO(temp) to filename
df = pd.read_csv(StringIO(temp), delim_whitespace=True, header = None, skiprows=2)
print (df)
0 1
0 [[325 301]
1 [236 191]
2 [294 274]
3 [354 357]
4 [237 241]
5 [344 335]
6 [419 401]
7 [312 286]
8 [209 206]