Question

我有File1和File2，如下所示。我发现了类似的问题，但并不完全相同。

使用File1行作为grep的输入，并提取File2的第一列。在下面的玩具示例中，如果File2中的column2等于a或b，则将1写入File_ab。

到目前为止，我正在使用双循环，估计时间是4天。我希望得到类似的东西：cat File1 | xargs -P 12 -exec grep "$1\|$2" File2 > File_$1$2.txt 但未能使语法正确。我正在尝试与greps条件并行运行12 OR。

File1
a b
c d

File2
1 a
2 b
3 c
1 d
4 a
5 e
6 d

所需的输出是2个文件，File_ab和File_cd：

File_ab
1
2
4
File_cd
1
3
6

注意：我的File1是25K行，File2是10Mln行。

Answer 1

使用perl：

#!/usr/bin/perl                                                                                               

use FileCache;

@a=`cat File1`;
chomp(@a);
for $a (@a) {
    @parts = split/ +/,$a;
    push @re, @parts;
    for $p (@parts) {
    $file{$p} = "File_".join "",@parts;
    }
}

$re = join("|",@re);

while(<>) {
    if(/(\d+).*($re)/o and $file{$2}) {
    $fh = cacheout $file{$2};
    print $fh $1,"\n";
    }
}

然后：

chmod 755 myscript
./myscript File2

Parallelise grep - 使用文件行作为grep的输入

1 个答案: