如何防止并发期货库循环遍历先前迭代的项目?

时间:2018-11-05 14:17:36

标签: python python-3.x concurrent.futures

我有一个要传递给API的数字ID(大约300K ID)列表(lst1),并将api结果附加到另一个列表(lst)中,如下所示:

lst = []
lst1 = [1,2,3,4,5,6]

print(len(lst1))
counter = 0
for i in lst1:
    url = 'url.com/Id={}'.format(i)
    while True:
        try:
            xml_data1 = requests.get(url).text
            counter = counter+ 1
            print(counter)
            #print(xml_data1)
            break
        except requests.exceptions.RequestException as e:
            print(e)
    lst.append(xml_data1)

当我应用future.concurrent库时,代码将不断循环遍历相同的ID。我可以说这是因为计数器编号不断重复,如何防止这种情况发生?

我如何应用futures.concurrent库的代码:

def get_data(xml):
    print(len(lst1))
    #counter = 0
    for i in lst1:
        url = 'url.com/Id={}'.format(i)
        while True:
            try:
                xml_data1 = requests.get(url).text
                counter = counter+ 1
                print(counter)
                #print(xml_data1)
                break
            except requests.exceptions.RequestException as e:
                print(e)
        lst.append(xml_data1)

with futures.ThreadPoolExecutor() as executor:  
    df_list = executor.map(get_data, lst1)

编辑:

def get_data(xml):
    #counter = 0
    for i in lst1:
        url = 'url.com/Id={}'.format(i)
        while True:
            try:
                xml_data1 = requests.get(url).text
                counter = next(counter_object)
                print(counter)
                #print(xml_data1)
                break
            except requests.exceptions.RequestException as e:
                print(e)
        lst.append(xml_data1)
    return lst
with futures.ThreadPoolExecutor() as executor:  
    lst = executor.map(get_data, lst1)

1 个答案:

答案 0 :(得分:2)

整数是不可变的。因此,您可以使用

将计数器设置为全局
main.py

您还可以使用itertools.count来定义全局global counter 对象(不是整数)

这是我的首选方法,因为它避免在不可变的对象(如整数)上使用counter,这总是会导致错误和误解。

global

现在:

import itertools
counter_object = itertools.count()  # default: starts at 0

成为:

counter = counter+ 1

工作线程之间的值将不同。

这取决于CPython具有全局解释器锁的事实,这使操作安全。如果您不使用CPython,则必须使用线程锁定机制来保护对象免受并发修改。

另一个问题是counter = next(counter_object) 不应返回列表,而应返回项目。让get_data创建列表(您的循环无用/有害,因为它乘以计算数量)

所以总结一下:

executor.map

最后,def get_data(xml): url = 'url.com/Id={}'.format(xml) while True: try: xml_data1 = requests.get(url).text counter = next(counter_object) print(counter) break except requests.exceptions.RequestException as e: print(e) return xml_data1 被迭代。要创建列表,您必须对其进行强制迭代:

executor.map