Question

我有一个包含日期列表的文本文件。我想将每个日期作为参数传递给Shell脚本，并针对文件中所有指定的日期运行脚本。

我想使用python并行执行此任务。由于脚本具有复杂的逻辑并监视执行，因此我想一次运行5个实例。一旦脚本完成，python必须启动新线程。

import threading
import time


class mythread(threading.Thread):
    def __init__(self, i):
        threading.Thread.__init__(self)
        self.h = i
        # Script will call the function

    def run(self):
        time.sleep(1)
        print("Value send ", self.h)


f = open('C:\Senthil\SenStudy\Python\Date.txt').readlines()
num = threading.activeCount()

for i in f:
    print("Active threads are ", num)
    time.sleep(1)
    if threading.activeCount() <= 5:
        thread1 = mythread(i)
        thread1.start()
    else:
        print("Number of Threads are More than 5 .. going to sleep state for 1 mint ...")
        time.sleep(1)

我尝试使用threading.activeCount()来获取正在运行的线程数，但是从一开始它说线程数是30（这是文件中所有日期条目的数量）。

Answer 1

您的问题似乎是为python进程池或线程池量身定制的。如果每个“线程”的输入参数只是一个日期，我认为进程池可能更好，因为线程之间的同步可能很棘手。

请阅读multiprocessing模块的documentation，看看它是否可以解决您的问题。如果您对此有任何疑问，我们将很乐意澄清。

（过程池的示例就在本文档的开头。如果您确实认为需要线程池，则语法将是相同的---只需将multiprocessing替换为{{3} }。

Answer 2

在确定需要线程而不是进程的情况下，可以使用ThreadPoolExecutor运行固定数量的辅助线程来完成工作：

from concurrent.futures import ThreadPoolExecutor


DATE_FILE = 'dates.txt'
WORKERS = 5


def process_date(date):
    print('Start processing', date)

    # Put here your complex logic.

    print('Finish processing', date)


def main():

    with open(DATE_FILE) as date_file:
        dates = [line.rstrip() for line in date_file]

    with ThreadPoolExecutor(WORKERS) as executor:
        executor.map(process_date, dates)
        executor.shutdown()


if __name__ == '__main__':
    main()

如果您使用Python 2，则必须先安装futures库才能完成此工作：

pip install --user futures

在python中{一次运行5个线程

2 个答案: