我尝试从以下网址读取.xlsx文件,但即使可以从浏览器成功下载文件,pd.read_excel
也会出错。
http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0
import numpy as np
import pandas as pd
data=pd.read_excel("http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0")
追溯是
>>> data=pd.read_excel("http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
[...]
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record;
found b'\x1f\x8b\x08\x00}\xceuZ'
答案 0 :(得分:1)
The first four bytes shown, \x1f\x8b\x08\x00
, make it clear that we're receiving a gzipped file, which pandas isn't automatically decompressing. We can do it ourselves, though:
In [54]: import urllib.request, gzip
In [55]: df = pd.read_excel(gzip.GzipFile(fileobj=urllib.request.urlopen(url)))
In [56]: df.iloc[:5, :5]
Out[56]:
دیده بان بازار : 1396/11/14 - زمان آخرین معامله : 14:42:03 \
0 نماد
1 فسا
2 فرآور
3 فملي2
4 وبملت
Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4
0 نام تعداد حجم ارزش
1 پتروشيمي فسا 512 5251556 3647287532
2 فرآوريموادمعدنيايران 310 694381 11249763313
3 ملي صنايع مس ايران 1 40949671 115887568930
4 بانك ملت 350 6350364 6561761997
The remaining tweaks to get the columns set appropriately aren't related to the problem in the post and so I'll leave those alone.