我试图查找.xlsx文件是否包含@。 我使用过pandas,效果很好,除非excel表第一列为空,然后失败..任何想法如何重写代码来处理/跳过空列?
代码:
df = pandas.read_excel(open(path,'rb'), sheetname=0)
out = 'False'
for col in df.columns:
if df[col].str.contains('@').any():
out = 'True'
break
这是我得到的错误:
df = pandas.read_excel(open(path,'rb'), sheetname=0)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 203, in read_excel
io = ExcelFile(io, engine=engine)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 258, in __init__
self.book = xlrd.open_workbook(file_contents=data)
File "/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 1271, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 1265, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x17Microso'
答案 0 :(得分:1)
如果您想检查至少一个单元格等于到特定字符/字符串:
def excel_has_str(filename, search='@'):
return pd.read_excel(filename).astype(str).eq(search).any().any()
如果您想检查至少一个单元格是否包含特定字符/字符串:
def excel_contains_str(filename, search='@'):
return pd.read_excel(filename) \
.astype(str) \
.apply(lambda x: x.str.contains(search)) \
.any() \
.any()
它会自动处理空字符串和空列...
答案 1 :(得分:1)
This might help。正如链接所说,可能是带有xlsx扩展名的HTML文件,或者它已经由Excel打开。 您也可以尝试以这种方式加载它,看看会发生什么:
pd.read_excel(path_of_file, sheetname=0)