Question

我正在使用ThreadPoolExecutor并将完全相同的任务交给工作人员。任务是运行一个jar文件并对其进行处理。我面临的问题与时间有关。

情况1：：我向池提交一项任务，工作人员在8秒内完成。

情况2：我两次向池中提交相同的任务，两个工作人员都完成了约10.50秒。

情况3：我三次向池中提交相同的任务，所有三个工作人员都在大约13.38秒左右完成。

案例4：：我向池中提交了4次相同的任务，所有前工都在大约18.88秒左右完成。

如果我将工作程序任务替换为time.sleep(8)（而不是运行jar文件），则所有4个工作程序都会在约8秒后完成。是否因为这样的事实，在执行Java代码之前，操作系统必须先创建Java环境，而该操作系统无法并行管理它？

有人可以解释一下为什么并行运行时同一任务的执行时间会增加吗？谢谢:)

这是我执行池的方式；

   def transfer_files(file_name):
        raw_file_obj = s3.Object(bucket_name='foo-bucket', key=raw_file_name)
        body = raw_file_obj.get()['Body']

        # prepare java command
        java_cmd = "java -server -ms650M -mx800M -cp {} commandline.CSVExport --sourcenode=true --event={} --mode=human_readable --configdir={}" \
        .format(jar_file_path, event_name, config_dir)

        # Run decoder_tool by piping in the encoded binary bytes
        log.info("Running java decoder tool for file {}".format(file_name))
        res = run([java_cmd], cwd=tmp_file_path, shell=True, input=body.read(), stderr=PIPE, stdout=PIPE)
        res_output = res.stderr.decode("utf-8")

        if res.returncode != 0:
            if 'Unknown event' in res_output:
                log.error("Exception occurred whilst running decoder tool")
                raise Exception("Unknown event {}".format(event_name))
        log.info("decoder tool output: \n" + res_output)

    with futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as pool:
            # add new task(s) into thread pool
            pool.map(transfer_file, ['fileA_for_workerA', 'fileB_for_workerB'])

Answer 1

使用多线程并不一定意味着它会执行得更快。您必须处理用于Python的GIL才能执行命令。认为1个人比1个人同时执行2个任务快1个任务。他/她将必须执行多任务并首先执行线程1的一部分，而不是切换到线程2，依此类推。线程越多，python解释器要做的事情就越多。

Java也可能发生同样的事情。我不使用Java，但它们可能有相同的问题。这里的Is Java a Compiled or an Interpreted programming language ?表示JVM即时转换Java，因此JVM可能必须处理与Python相同的问题。

对于time.sleep(8)，它所做的只是消耗线程的处理器时间，因此很容易在一系列等待任务之间切换。

Python多线程执行更长的时间来执行多个jar文件

1 个答案: