如何批量获取请求中的URL列表?

时间:2018-08-21 17:14:08

标签: python python-3.x concurrency python-multithreading

我有需要传递给API的ID列表。

我成功地将ID制成了一个url字符串,并且有一个〜300k url(〜300K ID)的列表

我想让每个api调用的文本部分都回到列表中。

我可以通过获取每个ID并使用for循环将其传递到URL中来实现此目的,而无需遍历列表:

 let people = { 
  age1: 43,
  age2: 23,
 };

 people.combinedAgesPlus2 = add2(people.age1, people.age2);

我一直在尝试使用L = [1,2,3] for i in L: #print (row) url = 'url&Id={}'.format(i) xml_data1 = requests.get(url).text lst.append(xml_data1) time.sleep(1) print(xml_data1) concurrent.futures和库一次发送多个请求,但是我不断收到错误消息:

urllib.request

使用此代码:

username=xxxx&password=xxxx&Id=1' generated an exception: 'HTTPResponse' object has no attribute 'readall'

如何调整已有的for循环或上面的代码以一次进行多个API调用?

我问是因为我的连接不断被for循环重置,而且我不知道如何继续从ID或url中断的地方继续。

使用python3.6

编辑:

我从这里Python requests with multithreading开始应用了代码

其中lst是网址列表。

lst = [url.com,url2.com]

URLS = lst

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result() 
            # do json processing here
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

该代码似乎没有给出错误消息,但是如何从代码中将response.text追加到列表中?

1 个答案:

答案 0 :(得分:1)

此处建议的

grequests: Python requests with multithreading

它不会直接适应您已经拥有的代码,您可能必须使用其他库重新编写,但这听起来更适合您的需求。

进一步我们的沟通。请查看下面的代码,该代码说明了要更改的内容。

import grequests
lst = ['https://www.google.com', 'https://www.google.cz']
class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)


    def collate_responses(self, results):
        return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)