我有一个错误的csv数据帧,并有错误的行。 Pandas引发带有行号的错误消息。是否可以获取此号码以用作除外?
在此错误消息:
pandas.errors.ParserError: Expected 187 fields in line 55898, saw 188. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
我想获得这行(55898),将其写入单独的日志文件中,然后再将其删除。同时获得“期望的”数字(187)和“看到的”数字(188)也很好。所以我可以写日志文件:
Error at line 55898. Fields added : 1
答案 0 :(得分:2)
使用repr
获取错误字符串,并使用re
纠正错误。
import re
try:
<code that raises exception>
except pandas.errors.ParserError as e:
errorstring = repr(e)
matchre = re.compile('Expected (\d+) fields in line (\d+), saw (\d+)')
(expected, line, saw) = map(int, matchre.search(errorstring).groups())
with open('error.log', "a+") as log:
log.write(f'Error at line {line}. Fields added : {saw - expected}.')
答案 1 :(得分:1)
首先,pandas.errors.ParserError
只是一个奇特的ValueError
(see source)。
剩下的就是将其包装到try-except
块中,并str()
例外:
import pandas as pd
try:
pd.read_csv('bad.csv')
except pd.errors.ParserError as e:
msg = str(e)
# Extract numbers and reformat the message for your needs.
Pandas生成消息并将消息作为str
传递到ParserError
时,没有其他方法。