Question

我正试图从python财务编程编程（link）的标准普尔500指数中的所有清单中获取报价数据。不幸的是，我在运行代码时遇到以下错误：

requests.exceptions.ContentDecodingError: ('Received response with 
content-encoding: gzip, but failed to decode it.', error('Error -3 while
decompressing data: incorrect data check',))

我猜这个问题来自不同股票的不同编码。如何更改我的代码（如下所示）以允许gzip解码？

import bs4 as bs
import pickle
import requests
import datetime as dt
import os
import pandas as pd
import pandas_datareader.data as web

def save_sp500_tickers():
response = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
#retrieve src code from url

soup = bs.BeautifulSoup(response.text, 'lxml')
#convert src code into bs4 format

table = soup.find('table', {'class':'wikitable sortable'})
#search the new soup object for the table tag of class wikitable sortable

tickers = []
#create a target array

for row in table.findAll('tr')[1:]:
#for each row in table find all rows sliced from index1
    ticker = row.findAll('td')[0].text
    #find all tableDefinitions and convert to text
    tickers.append(ticker)
    #add ticker to our tickers array
with open("sp500tickers.pickle","wb") as f:
    pickle.dump(tickers, f)

print(tickers)

return tickers

def getDataFromYahoo(reload_sp500 = False):
if(reload_sp500):
    tickers = save_sp500_tickers()
else:
    with open("sp500tickers.pickle","rb") as f:
        tickers = pickle.load(f)

if not os.path.exists('stock_dfs'):
    os.makedirs('stock_dfs')

start = dt.datetime(2010,1,1)
end = dt.datetime(2018,7,26)

for ticker in tickers:
    print(ticker)
    if not os.path.exists('stocks_dfs/{}.csv'.format(ticker)):
        df = web.DataReader(ticker, 'yahoo', start, end)
    else:
        print('Already have {}'.format(ticker))

getDataFromYahoo()

回溯（最近通话最近一次）：

  File "C:\Users\dan gilmore\Desktop\EclipseOxygen64WCSPlugin\cherryPY\S7P\index.py", line 55, in <module>
    getDataFromYahoo()
  File "C:\Users\dan gilmore\Desktop\EclipseOxygen64WCSPlugin\cherryPY\S7P\index.py", line 51, in getDataFromYahoo
    df = web.DataReader(ticker, 'yahoo', start, end)
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\data.py", line 311, in DataReader
    session=session).read()
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\base.py", line 210, in read
    params=self._get_params(self.symbols))
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\yahoo\daily.py", line 129, in _read_one_data
    resp = self._get_response(url, params=params)
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas_datareader\base.py", line 132, in _get_response
    headers=headers)
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 525, in get
    return self.request('GET', url, **kwargs)
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 512, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\sessions.py", line 662, in send
    r.content
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\models.py", line 827, in content
    self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
  File "C:\Users\dan gilmore\AppData\Local\Programs\Python\Python36-32\lib\site-packages\requests\models.py", line 754, in generate
    raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect data check',))

Answer 1

这里的根本问题是您正在遵循过期的教程。

如果您在顶部的the docs for pandas-datareader旁看到一个大框，上面写着：

警告

由于Yahoo! v0.6.0版本开始，由于它们的API发生了很大的变化并且没有稳定的替代品，因此Google期权，Google Quotes和EDGAR已被立即弃用。

每当您关注教程或博客文章时，如果某些内容行不通，那么您应该做的第一件事就是查看实际的文档，以获取他们教您使用的内容。事情发生了变化，包装Web API的事情也发生了迅速的变化。

无论如何，如果您向下滚动到数据源列表，您将看到没有Yahoo条目。但是代码仍在源代码中。因此，您不会在没有使用此类来源的情况下得到错误，而是会在尝试使用损坏的来源后得到一个错误。

从表面上看，发生的是datareader代码正在发出某种请求（您必须深入库中，或者用Wireshark捕获它，以查看URL和标头是什么）是）得到声称使用gzip内容编码的响应，但这样做是错误的。

内容编码是Web服务器应用于页面的内容，浏览器或客户端通常撤消了内容编码（通常是压缩），以使页面通过网络发送所需的时间更少。 gzip是最常见的压缩形式。这是一种非常简单的格式，这就是为什么它如此常用（服务器可以在无需超级计算机场的情况下gzip压缩数千个页面）的原因，但这意味着如果出现问题（例如服务器仅将流截断了16KB或类似的内容），除了gzip解压缩失败外，无法真正判断出问题所在。

但是无论如何，都无法解决此问题； ¹您必须重写代码以使用其他数据源。

如果您对代码的理解不够深入，则必须查找最新的教程以供学习。

_{1。除非您想弄清楚新的Yahoo API，否则假设存在一个API，并弄清楚如何解析它，并编写一个全新的pandas-datareader源，即使编写该库的专家已经放弃了尝试与Yahoo ...}

不同编码的解码响应

1 个答案: