我有两个文件File1和File2。我必须在File2到File1中找到关键字并对其进行计数。 File1中在File2中没有任何关键字的行应计为OTHERS,并可能将其保存在File3中(用于验证)。
File1中
CallSid
文件2
000001111YYYY0000
122334YYYY9999
89898989AAAA89899
AAAA7678989812234
ZZZZ878098098098
0000000000000000
输出
YYYY
AAAA
ZZZZ
File3(OTHERS)
YYYY: 2
AAAA: 2
ZZZZ: 1
OTHERS: 1
我所知道的方法是使用grep和wc -l来计算关键字,但这并不理想,特别是当我有很多关键字需要查找时。
答案 0 :(得分:2)
使用awk
CMDLINE
awk 'FNR==NR{a[$1];next}\
{b=1;for(i in a)if(z=gsub(i,"&")){x[i]+=z;b=0}}\
b{x["Others"]++;print > "file3"}\
END{for(i in x)print i, x[i]}' file{2,}
由于长度
,可能更适合脚本FNR==NR{
Strings[$1]
next
}
{
Found=0
for(Regex in Strings)
if(matches=gsub(Regex,"&")){
Sums[Regex]+=matches
Found=1
}
}
!Found{
Sums["Others"]++
print > "file3"
}
END{
for(Regex in Sums)
print Regex, Sums[Regex]
}
另存为
awkscript.awk
以
运行awk -f awkscript.awk file{2,}
答案 1 :(得分:0)
尝试:如果你没有按照file1或file2打扰输出的顺序,那么下面的内容可能对你有所帮助。
awk 'FNR==NR{A[$0];next} {gsub(/[0-9]/,"");} ($0 in A){B[$0]++;next} !($0 in A) && $0{OTHERS[$0]++} END{for(i in B){print i": "B[i]};for(j in OTHERS){print j": "OTHERS[j]}}' file2 file1
也会很快添加说明。
EDIT1:以非单一形式添加代码,并在此处进行适当的解释。
awk 'FNR==NR{ #### FNR==NR condition will be TRUE when first file file2 is being read, FNR and NR are awks built-in variables, both re-present line numbers of files only difference between them is FNR gets re-set whenever a new file is getting started and NRs value will be keep on increasing till all files get read.
A[$0]; #### creating an array whose index is $0(current line) of file2.
next #### using next keyword for skipping all the next statements.
}
{
VAL=$0; #### creating a variable named VAL which has current lines value.
gsub(/[0-9]/,""); #### gsub is awks built-in function to globally substituting all the digits to NULL in lines for file1.
}
($0 in A){ #### now checking if new-edited $0(current line) is present in array A then do following statements.
B[$0]++; #### creating an array named B with index of $0 and incrementing its value with 1 each time.
next #### using next keyword for skipping all the next statements.
}
!($0 in A){ #### If current line is NOT present in array A.
OTHERS[VAL]++ #### create an array named OTHERS with index of variable VAL and increment its value with 1 each time it comes in this section.
}
END{ #### Starting END section here for awk.
for(i in B){ #### Traversing through array B now.
print i": "B[i] #### printing the index of array B and its respective value now.
};
for(j in OTHERS){ #### Traversing through array OTHERS now.
print j": "OTHERS[j] #### printing index of array B with its value too.
}
}
' file2 file1 #### Mentioning the Input_files now.
答案 2 :(得分:0)
awk 解决方案(包括将“其他人”保存到单独的文件file3.txt
中):
awk 'NR==FNR{ group=(group)?group"|"$0 : $0; next }
{ if(match($0,group)){ a[substr($0,RSTART,RLENGTH)]++ }
else { a["OTHERS"]++; print >> "file3.txt" }
} END { for(i in a) print i": "a[i] }' file2 file1
输出:
ZZZZ: 1
AAAA: 2
YYYY: 2
OTHERS: 1
其他:
cat file3.txt
0000000000000000
答案 3 :(得分:0)
awk 'BEGIN{a["OTHERS"]=0}
(NR==FNR) {a[$0]=0;next}
{b=0}{for(i in a) if( match($0,i) !=0 ){a[i]++;b=1} }
{if(b==0) a["OTHERS"]++}
END{for(i in a) print i,": ",a[i]}'
File2 File1