我正在尝试运行此代码,该代码将从数据框中删除不必要的列,以供以后处理。它循环遍历第一个文件,然后给出以下错误。之前运行良好。我看到了一些有关它可能是损坏的文件的信息,所以我删除了所有以前的文件,并再次完成了步骤中所有文件的生成过程,但是仍然出现错误。抱歉,如果缠绵不清,我需要展示论文的每个步骤,而且我仍然是一个新手程序员。谁能解决这个问题?
代码是:
import pandas as pd
import os
path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
df = pd.read_csv(path+file)
df = df.drop('Hits', axis=1)
df = df.drop('Score', axis=1)
df = df.drop('Score.1', axis=1)
print(df)
filename = os.path.splitext(file)
(f, ext) = filename
print(f)
df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)
错误消息如下:
Traceback (most recent call last):
File "/home/sandra/git/trees/trees/remove_columns.py", line 9, in <module>
df = pd.read_csv(path+file)
File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
self._make_engine(self.engine)
File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/sandra/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 737, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'
。
答案 0 :(得分:1)
使用熊猫读取的文件已损坏或未处于可读状态时,通常会引发此错误。 如下修改代码应该可以:
import pandas as pd
import os
path = ('./Sketch_grammar/weighted/')
files = os.listdir(path)
for file in files:
if file.endswith('.csv'):
df = pd.read_csv(path+file)
df = df.drop('Hits', axis=1)
df = df.drop('Score', axis=1)
df = df.drop('Score.1', axis=1)
filename = os.path.splitext(file)
(f, ext) = filename
df.to_csv(path+'weighted_out/'+f+'_out.csv', index=False)