我有两个输入文件(制表符分隔),我需要找到它们之间的$ 1&&如果仅匹配第3和第4个字段将向下移动,则$ 2:
INPUT: 文件1:
p1 555
p1 557
p3 558
文件2:
p1 323 lololo aaaa
p1 555 papapp kkka
p1 556 hooho sssa
p1 557 jjjlo kkka
p3 424 zzzzz llla
p3 558 jjjjj ssss
输出:
p1 323 lololo aaaa
p1 555
p1 556 papaapp kkka
p1 557
p3 424 hooho sssa
p3 558
jjjlo kkka
等。
谢谢
答案 0 :(得分:3)
这些方面应该有效:
awk 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
那是:
NR == FNR { # while processing the first file (file1)
to_shift[$1,$2] = 1 # remember which lines to shift
next # and do nothing else
}
{ # afterwards (processing file2):
queue[++w] = $3 OFS $4 # queue the next payload fields
}
to_shift[$1, $2] { # If this is a shift line
print $1, $2 # print only the first two fields
next # and do nothing else
}
{ # otherwise, print the first two fields and
print $1, $2, queue[++r] # the next queued payload
}
END { # In the end:
while(r != w) { # print out what remains in the queue, i.e.
print OFS OFS queue[++r] # all that was shifted out at the bottom
}
}
我怀疑对于格式化,您可能希望使用\t
作为输出字段分隔符,在这种情况下,您只需将-v OFS='\t'
传递给awk
:
awk -v OFS='\t' 'NR == FNR { to_shift[$1,$2] = 1; next } { queue[++w] = $3 OFS $4 } to_shift[$1, $2] { print $1, $2; next } { print $1, $2, queue[++r] } END { while(r != w) { print OFS OFS queue[++r] } }' file1 file2
如果输入是以制表符分隔的,并且字段可以包含空格,那么也可以传递-F '\t'
以使输入字段分隔符成为选项卡。