Question

使用--pipe -N<int>，我可以发送给定数量的行作为由parallel开始的作业的输入。但是，我如何完成在每个块上使用:::赋予不同参数的多个作业？

让我们输入这个小输入文件：

A   B   C
D   E   F
G   H   I
J   K   L

此外，让我们定义将每两行通过管道传输到parallel作业。在它们上的命令cut -f<int>应该以列号作为输入参数执行，例如::: {1..3}

因此对于给定的示例，输出将如下所示

A
D
B
E
C
F
G
J
H
K
I
L

我尝试了以下命令：

cat input.txt|parallel --pipe -N2 'cut -f{1}' ::: {1..3}

但是输出是这样的：

A
D
I
L

我缺少什么？

fin游泳者

Answer 1

此：

cat input.txt|parallel --pipe -N2 'cut -f{1}' ::: {1..3}

从每个输入源读取2条记录。如果您这样做，则更清楚：

$ cat input.txt|parallel --pipe -v -N2 'cut -f{}' ::: {1..3}
cut -f1  -f2
cut: only one type of list may be specified
Try 'cut --help' for more information.
cut -f3
I
L

GNU Parallel将每个参数与一个块配对。您正在寻找的内容更像--tee，其中每个块都发送到每个命令。 --tee但是不会将输入分成块，而是将所有输入发送到命令。所以也许我们可以将两者结合起来：

doit() { parallel --pipe -N2 -v cut -f$@; }
export -f doit
cat input.txt|parallel --pipe --tee -v doit {} ::: {1..3}

或者您可以取消订单（这可能效率较低）：

doit() { parallel -v --pipe --tee cut -f{} ::: {1..3}; }
export -f doit
cat input.txt|parallel --pipe -N2 -v doit

对正在运行的内容感到满意时，请删除-v。

--tee的效率很高（--pipe为1-2 GBytes / s，--pipepart为2-3 GBytes / s），但它的缺点是，它可以启动所有作业并行：因此，如果您不是{1..3}拥有10000个值，那么它将启动10000个进程。

gnu parallel：--pipe和args的组合使用

1 个答案: