我有一个要传递给API的数字ID(大约300K ID)列表(lst1
),并将api结果附加到另一个列表(lst
)中,如下所示:>
lst = []
lst1 = [1,2,3,4,5,6]
print(len(lst1))
counter = 0
for i in lst1:
url = 'url.com/Id={}'.format(i)
while True:
try:
xml_data1 = requests.get(url).text
counter = counter+ 1
print(counter)
#print(xml_data1)
break
except requests.exceptions.RequestException as e:
print(e)
lst.append(xml_data1)
当我应用future.concurrent库时,代码将不断循环遍历相同的ID。我可以说这是因为计数器编号不断重复,如何防止这种情况发生?
我如何应用futures.concurrent库的代码:
def get_data(xml):
print(len(lst1))
#counter = 0
for i in lst1:
url = 'url.com/Id={}'.format(i)
while True:
try:
xml_data1 = requests.get(url).text
counter = counter+ 1
print(counter)
#print(xml_data1)
break
except requests.exceptions.RequestException as e:
print(e)
lst.append(xml_data1)
with futures.ThreadPoolExecutor() as executor:
df_list = executor.map(get_data, lst1)
编辑:
def get_data(xml):
#counter = 0
for i in lst1:
url = 'url.com/Id={}'.format(i)
while True:
try:
xml_data1 = requests.get(url).text
counter = next(counter_object)
print(counter)
#print(xml_data1)
break
except requests.exceptions.RequestException as e:
print(e)
lst.append(xml_data1)
return lst
with futures.ThreadPoolExecutor() as executor:
lst = executor.map(get_data, lst1)
答案 0 :(得分:2)
整数是不可变的。因此,您可以使用
将计数器设置为全局main.py
您还可以使用itertools.count
来定义全局global counter
对象(不是整数)
这是我的首选方法,因为它避免在不可变的对象(如整数)上使用counter
,这总是会导致错误和误解。
global
现在:
import itertools
counter_object = itertools.count() # default: starts at 0
成为:
counter = counter+ 1
工作线程之间的值将不同。
这取决于CPython具有全局解释器锁的事实,这使操作安全。如果您不使用CPython,则必须使用线程锁定机制来保护对象免受并发修改。
另一个问题是counter = next(counter_object)
不应返回列表,而应返回项目。让get_data
创建列表(您的循环无用/有害,因为它乘以计算数量)
所以总结一下:
executor.map
最后,def get_data(xml):
url = 'url.com/Id={}'.format(xml)
while True:
try:
xml_data1 = requests.get(url).text
counter = next(counter_object)
print(counter)
break
except requests.exceptions.RequestException as e:
print(e)
return xml_data1
被迭代。要创建列表,您必须对其进行强制迭代:
executor.map