我在循环浏览一长串文件时遇到了read_csv错误。这是我如何重现它。
给出以下伪代码:
import pandas as pd
ll = []
for f in ec_files:
fdf = pd.read_csv(f, dtype=dtypes, header=None,
parse_dates=[0, 1], index_col=1, names=colnames,
na_values=["NAN"], true_values=["t"],
false_values=["f"], low_memory=False)
ll.append(pd.DataFrame(fdf.mean()).transpose())
其中ec_files
是输入文件的路径名的长列表,colnames
是列名列表,dtypes
是每列的dtype字典。单独读取文件时没有任何问题,但是当使用上述循环时,进程将停止在具有以下跟踪的随机文件中:
回溯(最近一次呼叫最后一次):文件" junk.py",第15行,在 false_values = [" f"],low_memory = False)文件" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;,第645行,in parser_f return _read(filepath_or_buffer,kwds)File" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;, line 400 in _读 data = parser.read()File" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;,第938行,in 读 ret = self._engine.read(nrows)File" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;,第1505行,in 读 data = self._reader.read(nrows)文件" pandas / parser.pyx",第849行,在pandas.parser.TextReader.read(pandas / parser.c:9907)文件 " pandas / parser.pyx",第945行,在pandas.parser.TextReader._read_rows中 (pandas / parser.c:11161)文件" pandas / parser.pyx",第1047行,在 pandas.parser.TextReader._convert_column_data(pandas / parser.c:12536) 文件" pandas / parser.pyx",第1126行,in pandas.parser.TextReader._convert_tokens(pandas / parser.c:13783) ValueError:float()的无效文字:06-06 04:02:24.2
在解析过程中似乎有些东西会中断。为什么在使用循环时会发生这种情况,而不是发生此跟踪的文件?
虽然有些人可能会觉得很难相信,但每次运行脚本时发生错误的文件都会发生变化,但是这些文件会被单独读取而没有任何问题。下面显示了另一个脚本失败运行的追溯:
回溯(最近一次呼叫最后一次):文件" junk.py",第16行,在 false_values = [" f"],low_memory = False)文件" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;,第645行,in parser_f return _read(filepath_or_buffer,kwds)File" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;, line 400 in _读 data = parser.read()File" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;,第938行,in 读 ret = self._engine.read(nrows)File" /usr/lib/python2.7/dist-packages/pandas/io/parsers.py" ;,第1505行,in 读 data = self._reader.read(nrows)文件" pandas / parser.pyx",第849行,在pandas.parser.TextReader.read(pandas / parser.c:9907)文件 " pandas / parser.pyx",第945行,在pandas.parser.TextReader._read_rows中 (pandas / parser.c:11161)文件" pandas / parser.pyx",第1047行,在 pandas.parser.TextReader._convert_column_data(pandas / parser.c:12536) 文件" pandas / parser.pyx",第1126行,in pandas.parser.TextReader._convert_tokens(pandas / parser.c:13783)
ValueError:float()的文字无效:1.5635078898.16
从上面发生最后一次追溯的文件顶部开始几行:
2016-06-24 14:00:00,2016-06-24 14:00:00,-63.202653,67.693223,0.10,317.200,248.200,0.250,-0.770,-0.010,99.50,0.45,93.39,,,,1.12458829806343,1.56350788627135,1265.86,398.16,332.80,0.05078614,0.0061028,0.9117393,0.1835912,-0.4333494,-0.8065823,-0.649,1.14,-0.029,0.98,332.29,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2016-06-24 14:00:00,2016-06-24 14:00:00.1,-63.202653,67.693223,0.10,317.200,248.200,0.250,-0.770,-0.010,99.50,0.45,93.39,,,,1.12458829806343,1.56350788627135,1265.86,398.16,332.80,0.05210823,0.005970591,0.9118717,1.419696,-0.05049266,-1.156707,-0.638,1.139,-0.02,0.93,332.26,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2016-06-24 14:00:00,2016-06-24 14:00:00.2,-63.202653,67.693223,0.10,317.200,248.200,0.250,-0.770,-0.010,99.50,0.45,93.39,,,,1.12458829806343,1.56350788627135,1265.86,398.16,332.80,0.05038951,0.005441753,0.9117393,-0.2251475,0.1442362,0.6797946,-0.625,1.165,-0.017,0.95,332.27,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2016-06-24 14:00:00,2016-06-24 14:00:00.3,-63.202653,67.693223,0.10,317.200,248.200,0.250,-0.770,-0.010,99.50,0.45,93.39,,,,1.12458829806343,1.56350788627135,1265.86,398.16,332.80,0.05224044,0.0061028,0.9113424,-1.813954,-0.4432509,-0.2021224,-0.629,1.161,-0.041,0.97,332.28,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2016-06-24 14:00:00,2016-06-24 14:00:00.4,-63.202653,67.693223,0.10,317.200,248.200,0.250,-0.770,-0.010,99.50,0.45,93.39,,,,1.12458829806343,1.56350788627135,1265.86,398.16,332.80,0.05157939,0.00623501,0.9118717,0.1374433,-0.2023152,-1.166616,-0.595,1.166,-0.009,0.98,332.29,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
我的熊猫版本:
In [20]: pd.__version__
Out[20]:
'0.19.0+git14-ga40e185'
同样,这个(以及发生这些随机错误的任何文件)使用完全相同的read_csv命令自行读取。我担心这可能需要提供所有文件来进行交流。
感谢您的任何反馈, SEB