这是一个与之前提出的问题类似的问题(请参阅下面的链接),但这次我想将常用字符串输出到行而不是列,如下所示:
我有两个文件,每个文件都有一个如下所示的列:
档案1
chr1 106623434
chr1 106623436
chr1 106623442
chr1 106623468
chr1 10699400
chr1 10699405
chr1 10699408
chr1 10699415
chr1 10699426
chr1 10699448
chr1 110611528
chr1 110611550
chr1 110611552
chr1 110611554
chr1 110611560
文件2
chr1 1066234
chr1 106994
chr1 1106115
我想搜索文件1并拉出与文件2的第1行完全匹配的所有行,并在其自己的行上输出所有匹配项。然后我想对文件2的第2行执行相同的操作,依此类推,直到在文件1中找到文件2的所有匹配并输出到它自己的行。此外,我正在处理非常大的文件,因此不需要将文件2完全存储在内存中,否则它将无法运行完成。希望输出看起来像这样:
chr1 106623434 chr1 106623436 chr1 106623442 chr1 106623468
chr1 10699400 chr1 10699405 chr1 10699408 chr1 10699415 chr1 10699426 chr1 10699448
chr1 110611528 chr1 110611550 chr1 110611552 chr1 110611554 chr1 110611560
类似的问题: How to move all strings in one file that match the lines of another to columns in an output file?
答案 0 :(得分:3)
只要您的图案不完全重叠,这应该可以正常工作
$ while read p; do grep "$p" file1 | tr '\n' '\t'; echo ""; done < file2
chr1 106623434 chr1 106623436 chr1 106623442 chr1 106623468
chr1 10699400 chr1 10699405 chr1 10699408 chr1 10699415 chr1 10699426 chr1 10699448
chr1 110611528 chr1 110611550 chr1 110611552 chr1 110611554 chr1 110611560
答案 1 :(得分:1)
你可以这样做,因为它使用接近零的内存,但它会非常慢,因为它为“file2”的每一行读取整个“file1”一次:
$ cat tst.awk
{
ofs = ors = ""
while ( (getline line < "file1") > 0) {
if (line ~ "^"$0) {
printf "%s%s", ofs, line
ofs = "\t"
ors = "\n"
}
}
printf ors
close("file1")
}
$ awk -f tst.awk file2
chr1 106623434 chr1 106623436 chr1 106623442 chr1 106623468
chr1 10699400 chr1 10699405 chr1 10699408 chr1 10699415 chr1 10699426 chr1 10699448
chr1 110611528 chr1 110611550 chr1 110611552 chr1 110611554 chr1 110611560
答案 2 :(得分:0)
你可以尝试
awk -vOFS="\t" '
NR==FNR{ #only file2
keys[++i]=$0; #'keys' store pattern to search ('i' contains number of keys)
next; #stop processing the current record and
#go on to the next record
}
{
for(j=1; j<=i; ++j)
#if line start with key then add
if($0 ~ "^"keys[j])
a[keys[j]] = a[keys[j]] (a[keys[j]]!=""?OFS:"") $0;
}
END{
for(j=1; j<=i; ++j) print a[keys[j]]; #print formating lines
}' file2 file1
你明白了,
chr1 106623434 chr1 106623436 chr1 106623442 chr1 106623468 chr1 10699400 chr1 10699405 chr1 10699408 chr1 10699415 chr1 10699426 chr1 10699448 chr1 110611528 chr1 110611550 chr1 110611552 chr1 110611554 chr1 110611560