Question

让我们说我有一个用python编写的web bot，它通过POST请求将数据发送到网站。数据从文本文件中逐行拉出并传递到数组中。目前，我通过一个简单的for循环测试数组中的每个元素。如何有效地实现多线程以更快地迭代数据。我们说文本文件相当大。将线程附加到每个请求是否聪明？您认为最好的方法是什么？

with open("c:\file.txt") as file:
     dataArr = file.read().splitlines()

dataLen = len(open("c:\file.txt").readlines())-1

def test(data):
     #This next part is pseudo code
     result = testData('www.example.com', data)
     if result == 'whatever':
          print 'success'

for i in range(0, dataLen):
    test(dataArr[i])

我正在考虑这方面的一些事情，但我觉得这会导致问题，具体取决于文本文件的大小。我知道存在一些软件，允许最终用户在处理大量数据时指定线程数量。我不完全确定这是如何运作的，但这是我想要实施的内容。

import threading

with open("c:\file.txt") as file:
     dataArr = file.read().splitlines()

dataLen = len(open("c:\file.txt").readlines())-1

def test(data):
     #This next part is pseudo code
     result = testData('www.example.com', data)
     if result == 'whatever':
          print 'success'

jobs = []

for x in range(0, dataLen):
     thread = threading.Thread(target=test, args=(dataArr[x]))
     jobs.append(thread)

for j in jobs:
    j.start()
for j in jobs:
    j.join()

Answer 1

这听起来像multiprocessing.Pool

的食谱

见这里：https://docs.python.org/2/library/multiprocessing.html#introduction

from multiprocessing import Pool

def test(num):
    if num%2 == 0:
        return True
    else:
        return False

if __name__ == "__main__":
    list_of_datas_to_test = [0, 1, 2, 3, 4, 5, 6, 7, 8]

    p = Pool(4)  # create 4 processes to do our work
    print(p.map(test, list_of_datas_to_test))  # distribute our work

输出如下：

[True, False, True, False, True, False, True, False, True, False]

Answer 2

由于Global Interpreter Lock，线程在python中很慢。您应该考虑使用Python multiprocessing模块而不是线程来使用多个进程。使用多个进程可以增加代码的“加速”时间，因为产生一个真正的进程比轻量级线程需要更多的时间，但是由于GIL，threading将无法完成你所追求的目标。 / p>

Here和here是使用multiprocessing模块的几个基本资源。以下是第二个链接的示例：

import multiprocessing as mp
import random
import string

# Define an output queue
output = mp.Queue()

# define a example function
def rand_string(length, output):
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                    string.ascii_lowercase
                    + string.ascii_uppercase
                    + string.digits)
               for i in range(length))
    output.put(rand_str)

# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output)) for x in range(4)]

# Run processes
for p in processes:
    p.start()

# Exit the completed processes
for p in processes:
    p.join()

# Get process results from the output queue
results = [output.get() for p in processes]

print(results)

如何在Python Web机器人中高效实现多线程/多处理？

2 个答案: