当我使用Pandas read_csv()读取~35MB CSV时,我从CParser收到错误,可能是输入文件格式错误。示例如下,请参阅“PNCBANK,NATL”
行UPDATE ----- 当我保存为Windows CSV而不是“逗号分隔”文件类型与'c'引擎时,它运行完全正常
我阅读了从所有观察中删除逗号的CSV样本,问题仍然存在。因此,下面字符串中出现的逗号不会导致此问题。
685 201603 N 204602 0 1 O 80 44 134000 80 4.125 R N FRM IL SF 61900 F116Q1000024 P 360 2其他卖家CENTRALMTGECO
776 201604 204603 0 1 O 46 47 108000 46 3.875 R N FRM CO SF 81200 F116Q1000025 C 360 1其他卖家USBANKNA
693 201603 203102 0 1 S 21 44 81000 21 3.25 R N FRM CO PU 81100 F116Q1000026 N 180 2其他卖家USBANKNA
715 201603 204602 0 1 S 75 46 63000 75 4.375 R N FRM CO CO 81100 F116Q1000027 P 360 1其他卖家PNCBANK,NATL
691 201603 204602 30460 0 1 O 24 14 35000 24 3.875 R N FRM KY SF 40300 F116Q1000028 N 360 1其他卖家其他服务商
758 201603 204602 0 2 I 75 36 85000 75 4.5 R N FRM KY SF 40300 F116Q1000029 P 360 2其他卖家USBANKNA
但是,当我尝试将引擎交换到Python引擎时,我得到一个readlines错误(下面的第二个错误)。
我相信这是因为文件中有一个列,其中包含字符串中偶尔出现逗号的字符串,文件分隔符也是逗号。事实上,如果这是问题,我怎么能用其他符号替换这些逗号,如果不是完全删除它们,同时保留文件的其余部分。我知道这些逗号是哪些字符串,因为它是该列观察的特定子集。谢谢!
read_csv()
的C引擎出错Traceback (most recent call last):
File "/Users/paltamura/Desktop/fmData/fmData/exploratory/creditScore_descriptives.py", line 160, in <module>
lender_by_msa = lender_PerformanceByMSA()
File "/Users/paltamura/Desktop/fmData/fmData/exploratory/creditScore_descriptives.py", line 32, in lender_PerformanceByMSA
date_col_fmt_dict={'firstPaymentDate': '%Y%m'}
File "/Users/paltamura/Desktop/fmData/fmData/Load/load_loans.py", line 19, in load_data
nrows=10000 if nrows == 'sample' else nrows
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 325, in _read
return parser.read()
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 815, in read
ret = self._engine.read(nrows)
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1314, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 805, in pandas.parser.TextReader.read (pandas/parser.c:8748)
File "pandas/parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:9003)
File "pandas/parser.pyx", line 881, in pandas.parser.TextReader._read_rows (pandas/parser.c:9731)
File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:9602)
File "pandas/parser.pyx", line 1865, in pandas.parser.raise_parser_error (pandas/parser.c:23325)
pandas.io.common.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.
用于read_csv()的Python引擎的readlines()错误
Traceback (most recent call last):
File "/Users/paltamura/Desktop/fmData/fmData/exploratory/creditScore_descriptives.py", line 160, in <module>
lender_by_msa = lender_PerformanceByMSA()
File "/Users/paltamura/Desktop/fmData/fmData/exploratory/creditScore_descriptives.py", line 32, in lender_PerformanceByMSA
date_col_fmt_dict={'firstPaymentDate': '%Y%m'}
File "/Users/paltamura/Desktop/fmData/fmData/Load/load_loans.py", line 20, in load_data
engine='python'
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 805, in _make_engine
self._engine = klass(self.f, **self.options)
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1608, in __init__
self.columns, self.num_original_columns = self._infer_columns()
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1823, in _infer_columns
line = self._buffered_line()
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 1975, in _buffered_line
return self._next_line()
File "/anaconda/lib/python2.7/site-packages/pandas/io/parsers.py", line 2006, in _next_line
orig_line = next(self.data)
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
答案 0 :(得分:0)
Python有一个替换命令: cleaning_string = cleaning_string.replace(“,”,“”)
上面的命令将用“”(无)或任何你想要的内容替换所有逗号。该字符串保持原样,但没有逗号。