Question

我正在编写一个粗暴的程序员，并希望获得建议，以便通过降低其复杂性来提高效率。

我第一次使用multithread库但很快意识到它不适合基于CPU的任务，因此，我切换到multiprocessing库并注意到脚本执行时间的真正改进。

从一个进程（需要4分钟）跳转到8个进程（需要46秒）时有一个真正的改进。超过8个进程，脚本运行时间更长，这是预期的吗？
我使用wordlist填充multiprocessing.Queue()，然后，我使用队列提供进程。从wordlist直接读取一行然后直接处理它而不使用？

使用wordlist的内容填充队列的方法：

def populate_queue(wordlist, verbose):
    """ Loads up the wordlist
        into the multiprocessing.Queue() 
    """
    queue = multiprocessing.Queue()

    for entry in wordlist:
        if verbose:
            print(DEBUG + "Inserting into queue: " + Style.BRIGHT + entry.rstrip() + RESET)
        queue.put(entry.rstrip())

    return queue

我的脚本的主要方法：

def main():
     # Register the CTRL+C trap
    signal.signal(signal.SIGINT, signal_handler)

    global start_time

    args = parse_args()

    process_count = args.process_count
    token = args.token
    wordlist = args.wordlist
    verbose = args.verbose

    processes = []

    ## Variables summary
    print(INFO + "JWT: " + Style.BRIGHT + "{}".format(token) + RESET)
    print(INFO + "Wordlist: " + Style.BRIGHT + "{}".format(wordlist.name) + RESET)

    start_time = time.time()
    print("[*] starting {}".format(time.ctime()))

    # Load and segmentate the wordlist into the queue
    print(INFO + "Processing the wordlist..." + RESET)
    queue = populate_queue(wordlist, verbose)

    print(INFO + "Total retrieved words: " + Style.BRIGHT + "{}".format(queue.qsize()) + RESET)

    for i in range(process_count):
        process = Process(queue, token, verbose)
        print(INFO + "Starting Process #{}".format(i) + RESET)
        process.start()
        processes.append(process)

    print(WARNING + "Pour yourself some coffee, this might take a while..." + RESET)

    # Block the parent-process until all the child-processes finish to process the queue
    for process in processes:
        process.join()

    if not exit_flag:
        print(RESULT + "No match found" + Style.RESET_ALL)

    end_time = time.time()
    print("[*] finished {}".format(time.ctime()))

    elapsed_time = end_time - start_time
    print("[*] elapsed time {} sec".format(elapsed_time))

欢迎任何建议。

由于

多处理暴力复杂性的改进

0 个答案: