我要求计算发送到服务器的发布请求总数。我的脚本每个JSON文件使用一个包含帖子数据的线程。以下是粗略的代码段。
statistics = 0
def load_from_file(some_arguments, filename):
data_list = json.loads(open(filename).read())
url = address + getUrl(filename, config)
for data in data_list.get("results"):
statistics += 1
r = requests.post(url, data=json.dumps(data), headers=headers,
auth=HTTPBasicAuth(username, password))
def load_from_directory(some_arguments, directory):
pool = mp.Pool(mp.cpu_count() * 2)
func = partial(load_from_file, some_arguments)
file_list = [f for f in listdir(directory) if isfile(join(directory, f))]
pool.map(func, [join(directory, f) for f in file_list ])
pool.close()
pool.join()
print "total post requests", statistics
我想打印使用此脚本处理的帖子请求总数。这是正确的方式吗?
答案 0 :(得分:0)
使用多进程时,共享内存并不是那么简单。我没有看到需要使用多处理模块而不是线程。多处理主要用作全局解释器锁的变通方法。
在您的示例中,您正在使用IO绑定操作,这可能无法达到完整的CPU时间。如果您坚持使用多进程而不是线程,我建议您查看exchanging-objects-between-processes。
否则使用threading
您可以在线程之间共享全局statistics
变量。
import threading
statistics = 0
def load_from_file(some_arguments, filename):
global statistics
data_list = json.loads(open(filename).read())
url = address + getUrl(filename, config)
for data in data_list.get("results"):
statistics += 1
r = requests.post(url, data=json.dumps(data), headers=headers,
auth=HTTPBasicAuth(username, password))
def load_from_directory(some_arguments, directory):
threads = []
func = partial(load_from_file, some_arguments)
file_list = [f for f in listdir(directory) if isfile(join(directory, f))]
for f in file_list:
t = threading.Thread(target=func, args=(join(directory, f)))
t.start()
threads.append(t)
#Wait for threads to finish
for thread in threads:
thread.join()
print "total post requests", statistics
注意:这当前同时根据目录中的文件数生成线程。您可能希望实现某种限制以获得最佳性能。