Question

我有一个shell脚本，它读取文本文件的行，并使用每一行作为python脚本的参数。

file="some_file.txt"
while IFS= read -r line
do
    python some_script.py "$line"
done <"$file"

some_script.py完成处理一行需要几分钟，文本文件中有超过10000行。总而言之，我需要在shell脚本完成之前等待很长时间。

如何并行执行此操作？例如同时执行10 python some_script.py $line，这可以将总执行时间减少90％。

Answer 1

let x=10
while IFS= read -r line; do
   python some_script.py $line &
   let x=x-1 || { wait; let x=10; }
done <$file
wait

Answer 2

使用bash（用于// bind to all rows inside the grid... grid.find("tr").mouseup(function (e) { // do something var rowId = $(this).data('row-id'); location.href = "/row?id=" + rowId; }); // avoid when clicking any "a", "input" or "button" tags... grid.find("tr td a, tr td input, tr td button").mouseup(function (e) { e.stopPropagation(); });语法）和GNU xargs（用于$'\n'和-d参数）：

-P

有一点需要注意，这会将每一行作为单个参数传递给Python脚本。如果您需要shell首先执行字符串拆分：

# runs one python process per line, with whole line passed as an argument
<"$file" xargs -d $'\n' -P10 -n1 python some_script.py

Answer 3

使用 GNU Parallel ：

parallel -a some_file.txt -j 10 python some_script.py

您可以使用--progress添加进度条，并使用-j N更改并行作业的数量。您还可以使用--tag使用程序名称标记输出行，并使用--eta获取预计到达时间。您还可以非常简单地在多个主机上分发作业，并在任何作业失败时更改行为。

最后，使用：

parallel --dry-run ...

在没有实际做任何事情的情况下确切了解它会做什么。

所以，如果args.txt看起来像这样;

first line of arguments
line 2 arguments
this the the third set

你可以这样做：

parallel -a args.txt --colsep ' ' echo someScript.py

somescript.py line 2 arguments
somescript.py first line of arguments
somescript.py this the the third set

如何将读取文本文件行的shell脚本并行化为输入？

3 个答案: