Python请求流异步

时间:2018-12-20 15:41:27

标签: python python-3.x python-requests python-asyncio

我正在尝试通过设置 stream = True

从python请求库下载大文件

但是我希望此功能异步执行,并在后台下载时将响应发送回服务器。

这是我的代码

async def downloadFile(url, filename):
  r = requests.get(url, stream=True)
  with open(os.path.join('./files', filename), 'wb+') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)
  # Creating same file name
  # with _done appended to know that file has been downloaded
  with open(os.path.join('./files', filename + '_done'), 'w+') as f: 
    f.close()
  await asyncio.sleep(1)

从类似这样的其他函数调用此函数

# check if file exist in server
        if(os.path.exists(os.path.join('./files', fileName))):

            #file exist!!!

            #check if done file exist
            if(os.path.exists(os.path.join('./files', fileName + '_done'))):

                #done file exist
                self.redirect(self.request.protocol + "://" +
                              self.request.host + '/files/' + fileName)
            else:
                #done file not exist. Wait for 5 min more

                self.write('Wait 5 min')
                self.finish()
        else:
            # file doesnt exist. Initiate download
            self.write('Wait 5 min')
            self.finish()
            d = asyncio.ensure_future(downloadFile(
                fileRes, fileName))
            # loop = asyncio.get_event_loop()
            # loop.run_until_complete(d)

问题在于已创建文件,但其大小保持为0,并且从未创建附加到文件“ _done”的文件。 我在这里做什么错了?

1 个答案:

答案 0 :(得分:1)

您的代码对我有用。也许是您要获取的资源不起作用。

您可能想要尝试@brennan建议的enabling debug for requests,和/或将打印输出添加到代码中以遵循所发生的情况:

>>> import requests
>>> import asyncio
>>> 
>>> 
>>> async def downloadFile(url, filename):
...   print(f"• downloadFile({url}, {filename})")
...   r = requests.get(url, stream=True)
...   print(f" → r: {r}")
...   with open(os.path.join('./files', filename), 'wb+') as f:
...     print(f" → f is opened: {f}")
...     for chunk in r.iter_content(chunk_size=1024):
...         print(f"  → chunk is: {chunk}")
...         if chunk:
...             f.write(chunk)
...   # Creating same file name
...   # with _done appended to know that file has been downloaded
...   with open(os.path.join('./files', filename + '_done'), 'w+') as f:
...     print(f" → creating output with _done")
...     f.close()
...   print(f" → wait 1")
...   await asyncio.sleep(1)
... 
>>> 
>>> 
>>> d = asyncio.ensure_future(downloadFile('https://xxx/yyy.jpg', 'test.jpg'))
>>> loop = asyncio.get_event_loop()
>>> loop.run_until_complete(d)
• downloadFile(https://xxx/yyy.jpg, test.jpg)
 → r: <Response [200]>
 → f is opened: <_io.BufferedRandom name='./files/test.jpg'>
  → chunk is: b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\r\t\n\x0b\n\x08\r\x0b\n\x0b\x0e\x0e\r\x0f\x13....'
  → chunk is: ...
  ...
 → creating output with _done
 → wait 1

这将使您的_done部分代码无用(您只需要打印输出)。甚至是最后的等待(完成后……完成!)。

async def downloadFile(url, filename):
  r = requests.get(url, stream=True)
  with open(os.path.join('./files', filename), 'wb+') as f:
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)

尽管也许您可能想捕获连接到服务器时发生的所有可能的问题并采取相应措施:

async def downloadFile(url, filename):
  try:
    r = requests.get(url, stream=True)
    r.raise_for_status() # to raise on invalid statuses
    with open(os.path.join('./files', filename), 'wb+') as f:
      for chunk in r.iter_content(chunk_size=1024):
          if chunk:
              f.write(chunk)
  except requests.RequestException as err:
    # do something smart when that exception occurs!
    print(f"Exception has occured: {err}")