我有一个包含列中几个数字的文件:(numbers.txt)
2
5
126
3005
65
还有另一个文本文件,它是这样的:(input.txt)
# 126 2 0
bla mjnb kjh ojj
# 5 65 0
kjh jhgg kjhkjh juh
hgj ikaw esd cdqw
# 100 3005 0
jhgjh jh jhjhg pol
第一行很重要。在#之后编写的两个数字都应该在number.txt中,我已经编写了以下代码,但是我的巨大文件需要几个星期。我的numbers.txt包含大约2500个数字。
#!/bin/bash
cat numbers.txt | while read first
do
for second in $(cat numbers.txt)
do
awk -v RS="#" "/ $first $second / {sub(/^ /,RS);print}"input.txt >> output1.txt
done
done
输出应该是:
# 126 2 0
bla mjnb kjh ojj
# 5 65 0
kjh jhgg kjhkjh juh
hgj ikaw esd cdqw
有人可以提供更快的方式来达到输出吗?
答案 0 :(得分:1)
使用awk可以不创建嵌套循环:
awk 'FNR==NR{a[$0];next} $1=="#" && ($2 in a) && ($3 in a) {p=1}
$1=="#" && (!($2 in a) || !($3 in a)) {p=0} p' file1 file2
# 126 2 0
bla mjnb kjh ojj
# 5 65 0
kjh jhgg kjhkjh juh
hgj ikaw esd cdqw
答案 1 :(得分:1)
awk '
# read the numbers file into the array "num"
NR == FNR {num[$1]; next}
# if this is a "#" line and the first 2 numbers are in "num" set a flag to "true"
$1 == "#" {p = (($2 in num) && ($3 in num))}
# print the current line if the flag is true
p
' numbers.txt input.txt