Python在pandas csv reader中解压缩gzip csv

时间:2017-12-23 01:11:40

标签: python python-2.7 pandas python-requests gzip

以下代码适用于Python3但在Python2中失败

r = requests.get("http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz", stream=True)
decompressed_file = gzip.GzipFile(fileobj=r.raw)
data = pd.read_csv(decompressed_file, sep=',')
data.columns = ["timestamp", "price" , "volume"]  # set df col headers
return data

我在Python2中遇到的错误如下:

TypeError: 'int' object has no attribute '__getitem__'

错误发生在我将数据设置为pd.read_csv(...)

的行上

对我来说似乎是一个熊猫错误

堆栈跟踪:

Traceback (most recent call last):
  File "fetch.py", line 51, in <module>
    print(f.get_historical())
  File "fetch.py", line 36, in get_historical
    data = pd.read_csv(f, sep=',')
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine

    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 760, in pandas._libs.parsers.TextReader._get_header
  File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2197, in pandas._libs.parsers.raise_parser_error
io.UnsupportedOperation: seek

1 个答案:

答案 0 :(得分:3)

您发布的回溯问题与Response对象的raw属性是类文件对象这一事实有关,该对象不支持典型文件对象的.seek方法支持。但是,当使用pd.read_csv摄取文件对象时,pandas(在python2中)似乎正在使用所提供文件对象的seek方法。

您可以通过调用r.raw.seekable()来确认无法查找返回的回复原始数据,False通常应返回io.BytesIO

解决此问题的方法可能是将返回的数据包装到import gzip import io import pandas as pd import requests # file_url = "http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz" file_url = "http://api.bitcoincharts.com/v1/csv/aqoinEUR.csv.gz" r = requests.get(file_url, stream=True) dfile = gzip.GzipFile(fileobj=io.BytesIO(r.raw.read())) data = pd.read_csv(dfile, sep=',') print(data) 0 1 2 0 1314964052 2.60 0.4 1 1316277154 3.75 0.5 2 1316300526 4.00 4.0 3 1316300612 3.80 1.0 4 1316300622 3.75 1.5 对象中,如下所示:

io.BytesIO(r.raw.read())

如您所见,我使用了可用文件目录中的较小文件。您可以将其切换到所需的文件。 无论如何,io.UnsupportedOperation应该是可以搜索的,因此应该有助于避免您遇到的TypeError例外。

至于{{1}}例外,它在这段代码中不存在。

我希望这会有所帮助。