我正在尝试访问位于https://www.cmegroup.com/CmeWS/exp/voiProductDetailsViewExport.ctl?media=xls&tradeDate=20180709&reportType=P&productId=425的文件,但遇到了一些困难。最初,我的进度因一些错误的请求标志而受阻,但是现在我正在发送"User-Agent": "Mozilla/5.0
,我得到了正确的响应。
当我有效下载文件时(.xls),我注意到在左上角(从第1行到大约3行)一遍又一遍地粘贴了大量相同的徽标。我意识到Pandas无法解析其中包含图像的文件。我一直在搜寻,但还没有找到一个示例,您可以从Excel文件中删除图像的所有实例,而仅保留文本。
我的思考过程是以某种方式找到特定工作表的对象,然后删除所有这些对象,直到仅留下文本数据为止,但是事实证明,这比预期的要困难得多。以下代码当前会生成TypeError: unsupported operand type(s) for <<: 'str' and 'int'
任何帮助或指导,我们将不胜感激。
def get_sheet(self):
# Accesses CME direct URL (at the moment...will add functionality for ICE later)
# Gets sheet and puts it in dataframe
#Returns dataframe sheet
sheet_url = "http://www.cmegroup.com/CmeWS/exp/voiProductDetailsViewExport.ctl?media=xls&tradeDate="+str(self.date_of_report)+"&reportType="\
+ str(self.report_type)+"&productId=" + str(self.product)
header = {
"User-Agent": "Mozilla/5.0"
}
req = requests.get(url = sheet_url, headers = header)
file_obj = io.StringIO(req.content.decode('ISO-8859-1'))
data_sheet = pd.read_excel(file_obj)
return data_sheet
编辑:请参见下面的完整堆栈错误。
Traceback (most recent call last):
File "OI_driver.py", line 16, in <module>
OI_driver()
File "OI_driver.py", line 10, in OI_driver
front_month = mgd.Month_Data(product_dict["LO"], "06/27/2018", "P")
File "D:\Open Interest Report Dev\month_graph_data.py", line 12, in __init__
self.data_sheet = self.get_sheet()
File "D:\Open Interest Report Dev\month_graph_data.py", line 30, in get_sheet
data_sheet = pd.read_excel(file_obj)
File "C:\Users\Tyler\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 177, in wrapper
return func(*args, **kwargs)
File "C:\Users\Tyler\Anaconda3\lib\site-packages\pandas\util\_decorators.py", line 177, in wrapper
return func(*args, **kwargs)
File "C:\Users\Tyler\Anaconda3\lib\site-packages\pandas\io\excel.py", line 307, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\Tyler\Anaconda3\lib\site-packages\pandas\io\excel.py", line 392, in __init__
self.book = xlrd.open_workbook(file_contents=data)
File "C:\Users\Tyler\Anaconda3\lib\site-packages\xlrd\__init__.py", line 162, in open_workbook
ragged_rows=ragged_rows,
File "C:\Users\Tyler\Anaconda3\lib\site-packages\xlrd\book.py", line 91, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Users\Tyler\Anaconda3\lib\site-packages\xlrd\book.py", line 1267, in getbof
opcode = self.get2bytes()
File "C:\Users\Tyler\Anaconda3\lib\site-packages\xlrd\book.py", line 672, in get2bytes
return (BYTES_ORD(hi) << 8) | BYTES_ORD(lo)
TypeError: unsupported operand type(s) for <<: 'str' and 'int'
答案 0 :(得分:1)
如何先将内容保存到本地文件中?
import io
import requests
import pandas as pd
url = "https://www.cmegroup.com/CmeWS/exp/voiProductDetailsViewExport.ctl?media=xls&tradeDate=20180709&reportType=P&productId=425"
req = requests.get(url)
xls_file = "tmp.xls"
with open(xls_file, "w") as f:
f.write(req.content)
ds = pd.read_excel(xls_file)
print(ds)
答案 1 :(得分:0)
为我工作
import requests
import io
import pandas as pd
url = '......'
response = requests.get(url, stream=True, headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3970.5 Safari/537.36'})
file_obj = io.BytesIO(response.content)
df = pd.read_excel(file_obj)
print(df)