Python中各种线程之间的共享变量

时间:2015-05-25 15:23:40

标签: python multithreading threadpool

我要求计算发送到服务器的发布请求总数。我的脚本每个JSON文件使用一个包含帖子数据的线程。以下是粗略的代码段。

statistics = 0

def load_from_file(some_arguments, filename):
    data_list = json.loads(open(filename).read())
    url = address + getUrl(filename, config)
    for data in data_list.get("results"):
        statistics += 1 
        r = requests.post(url, data=json.dumps(data), headers=headers,
                          auth=HTTPBasicAuth(username, password))

def load_from_directory(some_arguments, directory):
    pool = mp.Pool(mp.cpu_count() * 2)
    func = partial(load_from_file, some_arguments)
    file_list = [f for f in listdir(directory) if isfile(join(directory, f))]
    pool.map(func, [join(directory, f) for f in file_list ])
    pool.close() 
    pool.join() 

    print "total post requests", statistics

我想打印使用此脚本处理的帖子请求总数。这是正确的方式吗?

1 个答案:

答案 0 :(得分:0)

使用多进程时,共享内存并不是那么简单。我没有看到需要使用多处理模块而不是线程。多处理主要用作全局解释器锁的变通方法。

在您的示例中,您正在使用IO绑定操作,这可能无法达到完整的CPU时间。如果您坚持使用多进程而不是线程,我建议您查看exchanging-objects-between-processes

否则使用threading您可以在线程之间共享全局statistics变量。

import threading

statistics = 0

def load_from_file(some_arguments, filename):
    global statistics
    data_list = json.loads(open(filename).read())
    url = address + getUrl(filename, config)
    for data in data_list.get("results"):
        statistics += 1
        r = requests.post(url, data=json.dumps(data), headers=headers,
                        auth=HTTPBasicAuth(username, password))

def load_from_directory(some_arguments, directory):
    threads = []
    func = partial(load_from_file, some_arguments)

    file_list = [f for f in listdir(directory) if isfile(join(directory, f))]

    for f in file_list:
        t = threading.Thread(target=func, args=(join(directory, f)))
        t.start()
        threads.append(t)

    #Wait for threads to finish
    for thread in threads:
        thread.join()

    print "total post requests", statistics

注意:这当前同时根据目录中的文件数生成线程。您可能希望实现某种限制以获得最佳性能。