Question

我有一个zipcodes列表，我希望使用yelp fusion api提取商家信息。每个邮政编码必须至少进行一次api调用（通常更多），因此，我希望能够跟踪我的api用法，因为每日限制为25000.我已将每个邮政编码定义为用户定义的Locale实例类。这个语言环境类有一个类变量Locale.pulls，它作为拉取次数的全局计数器。

我想使用多处理模块对多线程进行多线程处理，但我不确定是否需要使用锁定，如果是这样，我该怎么做？关注的是竞争条件，因为我需要确保每个线程都看到当前在下面的伪代码中定义为Zip.pulls类变量的pull数。

import multiprocessing.dummy as mt 


class Locale():
    pulls = 0
    MAX_PULLS = 20000

    def __init__(self,x,y):
        #initialize the instance with arguments needed to complete the API call  

    def pull(self):
        if Locale.pulls > MAX_PULLS: 
            return none
        else: 
            # make the request, store the returned data and increment the counter
            self.data = self.call_yelp() 
            Locale.pulls += 1


def main():
    #zipcodes below is a list of arguments needed to initialize each zipcode as a Locale class object
    pool = mt.Pool(len(zipcodes)/100) # let each thread work on 100 zipcodes
    data = pool.map(Locale, zipcodes)

Answer 1

一个简单的解决方案是在运行len(zipcodes) < MAP_PULLS之前检查map()。

多线程python抓取所需的锁定？

1 个答案: