我在尝试使用pandas.read_table
读取数据时尝试指定数据类型。我的主要原因不是速度,而是忽略错误格式化的记录,不幸的是这些记录会发生。但是,不是用NA填充这些记录,而是简单地打破了脚本,我发现pandas.read_table
没有其他任何可以强制启用的转换。
这是关于pandas 0.17.1
有什么可做的?
相关行位于错误消息中:
Traceback (most recent call last):
File "/Users/laszlo.sandor/Downloads/mock_monthly_inpatient_treatments.py", line 31, in <module>
treatments = pd.read_table(filename,usecols=[0,3,4,6], engine='c', dtype={'LopNr':np.uint16,'INDATUMA':np.uint16,'UTDATUMA':np.uint16,'DIAGNOS':object})
File "//anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds)
File "//anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 285, in _read
return parser.read()
File "//anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 747, in read
ret = self._engine.read(nrows)
File "//anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1197, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988)
File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)
File "pandas/parser.pyx", line 865, in pandas.parser.TextReader._read_rows (pandas/parser.c:9261)
File "pandas/parser.pyx", line 972, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10654)
File "pandas/parser.pyx", line 1053, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:12010)
ValueError: cannot safely convert passed user dtype of <u2 for object dtyped data in column 3
第3列中的格式错误的数据是2008o730
。