Question

我刚写了一个涉及多线程的python脚本，如：

    python myScript.py -cpu_n 5 -i input_file

要为我的数百个输入文件运行命令，我为每个输入文件生成一个命令列表（commands.list）：

    python myScript.py -cpu_n 5 -i input_file1
    python myScript.py -cpu_n 5 -i input_file2
    python myScript.py -cpu_n 5 -i input_file3
    ...

我正在尝试使用'parallel'命令和三台不同机器的10个CPU来安排它们：

   parallel -S 10/$server1 -S 10/$server2 -S 10/$server3 < commands.list

我的问题是：使用parallel命令在每台服务器上使用的最大CPU数是多少？它会是5 * 10 = 50还是只有10 cpus？

Answer 1

来自man parallel：

   --jobs N
   -j N
   --max-procs N
   -P N     Number of jobslots on each machine. Run up to N
            jobs in parallel.  0 means as many as possible.
            Default is 100% which will run one job per CPU
            core on each machine.


   -S
   [@hostgroups/][ncpu/]sshlogin[,[@hostgroups/][ncpu/]sshlogin[,...]]
   :
            GNU parallel will determine the number of CPU
            cores on the remote computers and run the number
            of jobs as specified by -j.  If the number ncpu
            is given GNU parallel will use this number for
            number of CPU cores on the host. Normally ncpu
            will not be needed.

因此，您的命令将在每台服务器上并行运行最多10个作业。

您的每个命令是否将使用5个CPU内核尚不清楚。如果每个命令使用5个内核，则每个服务器将使用50个内核，在这种情况下，我建议您不要使用ncpu/server语法，而是使用：

parallel -j 20% -S $server1,$server2,$server3 < commands.list

通过这种方式，您可以混合具有不同内核数量的服务器，并且GNU Parallel将并行启动1/5。

使用GNU parallel来并行化多线程命令

1 个答案: