熊猫,跳过xslx中的空列

时间:2018-01-10 14:32:36

标签: python excel pandas

我试图查找.xlsx文件是否包含@。 我使用过pandas,效果很好,除非excel表第一列为空,然后失败..任何想法如何重写代码来处理/跳过空列?

代码:

df = pandas.read_excel(open(path,'rb'), sheetname=0)
out = 'False'
for col in df.columns:
    if df[col].str.contains('@').any():
        out = 'True'
        break

这是我得到的错误:

    df = pandas.read_excel(open(path,'rb'), sheetname=0)
  File "/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 203, in read_excel
    io = ExcelFile(io, engine=engine)
  File "/anaconda3/lib/python3.6/site-packages/pandas/io/excel.py", line 258, in __init__
    self.book = xlrd.open_workbook(file_contents=data)
  File "/anaconda3/lib/python3.6/site-packages/xlrd/__init__.py", line 162, in open_workbook
    ragged_rows=ragged_rows,
  File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 91, in open_workbook_xls
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 1271, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "/anaconda3/lib/python3.6/site-packages/xlrd/book.py", line 1265, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x17Microso'

2 个答案:

答案 0 :(得分:1)

如果您想检查至少一个单元格等于到特定字符/字符串:

def excel_has_str(filename, search='@'):
    return pd.read_excel(filename).astype(str).eq(search).any().any()

如果您想检查至少一个单元格是否包含特定字符/字符串:

def excel_contains_str(filename, search='@'):
    return pd.read_excel(filename) \
             .astype(str) \
             .apply(lambda x: x.str.contains(search)) \
             .any() \
             .any()

它会自动处理空字符串和空列...

答案 1 :(得分:1)

This might help。正如链接所说,可能是带有xlsx扩展名的HTML文件,或者它已经由Excel打开。 您也可以尝试以这种方式加载它,看看会发生什么:

pd.read_excel(path_of_file, sheetname=0)