这是代码:
xls = open_workbook('data.xls')
作为回报:
File "/home/woles/P2/fin/fin/apps/data_container/importer.py", line 16, in import_data
xls = open_workbook('data.xlsx')
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/__init__.py", line 435, in open_workbook
ragged_rows=ragged_rows,
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1230, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "/home/woles/P2/fin/local/lib/python2.7/site-packages/xlrd/book.py", line 1224, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\r\n<html>'
文件没有损坏,我可以用Excel,LibreOffice打开它。
答案 0 :(得分:1)
对于.xls
文件,您可以使用read_excel()
:
df1= pd.read_excel("filename.xls")
参数header
和sep
可以帮助您摆脱一些错误(here you can find more info on the parameters)。用法示例
df2= pd.read_excel("filename.xls", header = None, sep='delimiter')
请注意,如果文件为.csv
,则会收到错误消息
XLRDError:不支持的格式或文件损坏:预期的BOF记录;
要阅读.csv
,需要像这样使用read_csv()
df3= pd.read_csv("filename.csv")
答案 1 :(得分:0)
我现在已经解决了同样的错误,第一步是进行ID检查,将文件更改为文本并注意到html内容,然后进行了一些修改,以确保一旦将其另存为HTML并在浏览器中打开,表格就应该可见。比美丽的汤和熊猫帮助我获得了出色的输出....
检查下面几行是否有帮助。
import pandas as pd
import os
import shutil
import html5lib
import requests
from bs4 import BeautifulSoup
import re
import time
shutil.copy('donloaded.xls','changed.html')
shutil.copy('changed.html','txt_output.txt')
time.sleep(2)
txt = open('txt_output.txt','r').read()
# Modify the text to ensure the data display in html page
txt = str(txt).replace('<style> .text { mso-number-format:\@; } </script>','')
# Add head and body if it is not there in HTML text
txt_with_head = '<html><head></head><body>'+txt+'</body></html>'
# Save the file as HTML
html_file = open('output.html','w')
html_file.write(txt_with_head)
# Use beautiful soup to read
url = r"C:\Users\hitesh kumar\PycharmProjects\OEM ML\output.html"
page = open(url)
soup = BeautifulSoup(page.read(), features="lxml")
my_table = soup.find("table",attrs={'border': '1'})
frame = pd.read_html(str(my_table))[0]
print(frame.head())
frame.to_excel('testoutput.xlsx',sheet_name='sheet1', index=False)
答案 2 :(得分:-1)
尝试用pandas打开它:
import pandas as pd
data=pd.read_html(filename.xls)
或尝试任何其他html python解析器。
这不是一个合适的excel文件,而是一个可以用excel读取的HTML。