如果在文件中找到两个先前字段,我正在尝试添加列。
我有一个包含大量条目的逗号分隔文件,我需要找到两列匹配的所有行,第二列和第七列。如果在多行上找到两者,则添加第八列,表示"共享"。
文件内容:
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314
所需的输出:
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared
我已经搜索过并找到了这个链接Awk - matching on 2 columns for differents lines,但它并没有完全符合我的要求,它只匹配以下一行。
我可以这样做:
while IFS=',' read host device blk poolnum porta portb serial
ldev_count=`cat outputtest.txt | grep -iw $device | grep -iw $serial | wc -l`
if [[ $ldev_count > 1 ]] ; then
echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial, SHARED" >> semifinal.txt
else
echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial" >> semifinal.txt
fi
done < outputtest.txt
但它非常慢。我希望找到更好的解决方案。
感谢您的帮助。
已编辑格式
答案 0 :(得分:3)
你可能需要这个:
awk -F\, 'NR==FNR{a[$2]++;b[$7]++;next}
a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1' OFS=',' file file
结果:
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared
<强>解释强>
我们将迭代文件两次:
首先:NR==FNR{a[$2]++;b[$7]++;next}
我们会重复每列,并将其存储在a
和b
数组中。
第二次:a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1
要过滤与您期望的代表数匹配的行,对于这两个列,此数字必须大于1才能添加新的结尾列:$(NF+1)="shared"
。
注意:1
只是避免使用print语句的快捷方式。
答案 1 :(得分:2)
请您试着跟随并告诉我这是否对您有帮助。
awk -F, 'FNR==NR{a[$2,$7]++;next} a[$2,$7]>1{print $0",shared"}' Input_file Input_file
输出如下。
WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared
编辑: 如果您想打印带有“共享”字符串的匹配行,并且只需打印不匹配的行,则以下内容可能对您有所帮助。
awk -F, ' ##Creating field delimiter as comma.
FNR==NR{ ##FNR==NR is a condition which will be TRUE when first Input_file is being read.
a[$2,$7]++; ##creating an array named a whose index is $2,$7(second and 7th field) and incrementing its value with 1 each time same elements come.
next ##Using next keyword will skip all further statements.
}
a[$2,$7]>1{ ##This condition will be TRUE only when 2nd Input_file is being read, check if array a value in index of $2,$7 is greater than 1.
print $0",shared" ##Printing the current line with keyword shared at last of line.
next;
}
1
' Input_file Input_file ##Mentioning the Input_file twice here.