使用带有并发请求的请求获取空响应

时间:2019-03-02 19:30:13

标签: python python-3.x asynchronous request web-crawler

我试图基于URL列表发出请求,然后检查每个响应的内容。 我发现这在SO上似乎是不错的,如果我输出请求的响应,我会得到corect结果,但是一旦我要返回内容,它就为空...为什么?

import pandas as pd
import concurrent.futures
import requests
import time

out = []
CONNECTIONS = 100
TIMEOUT = 5

tlds = open('rez.txt').read().splitlines()
urls = ['http://{}'.format(x) for x in tlds[1:]]

def load_url(url, timeout):
    ans = requests.head(url, timeout=timeout)
    return ans

with concurrent.futures.ThreadPoolExecutor(max_workers=CONNECTIONS) as executor:
    future_to_url = (executor.submit(load_url, url, TIMEOUT) for url in urls)
    time1 = time.time()
    for future in concurrent.futures.as_completed(future_to_url):

        try:
            data = future.result()
            #why is data.content empty??
            print(data.content)
        except Exception as exc:
            data = str(type(exc))
        finally:
            out.append(data)
            #print(out)
            print(str(len(out)),end="\r")

    time2 = time.time()

print(f'Took {time2-time1:.2f} s')
print(pd.Series(out).value_counts())

此代码的输出:

b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''
b''

原始线程: Getting HEAD content with Python Requests

0 个答案:

没有答案