bash - 如果两列匹配则添加列

时间:2017-10-09 06:29:21

标签: bash awk

如果在文件中找到两个先前字段,我正在尝试添加列。

我有一个包含大量条目的逗号分隔文件,我需要找到两列匹配的所有行,第二列和第七列。如果在多行上找到两者,则添加第八列,表示"共享"。

文件内容:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314

所需的输出:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

我已经搜索过并找到了这个链接Awk - matching on 2 columns for differents lines,但它并没有完全符合我的要求,它只匹配以下一行。

我可以这样做:

while IFS=',' read host device blk poolnum porta portb serial

    ldev_count=`cat outputtest.txt | grep -iw $device | grep -iw $serial | wc -l`
    if [[ $ldev_count > 1 ]] ; then
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial, SHARED" >> semifinal.txt
    else
        echo "$host, $device, $blk, $poolnum, $porta, $portb, $serial" >> semifinal.txt
    fi
done < outputtest.txt

但它非常慢。我希望找到更好的解决方案。

感谢您的帮助。

已编辑格式

2 个答案:

答案 0 :(得分:3)

你可能需要这个:

awk -F\, 'NR==FNR{a[$2]++;b[$7]++;next}
          a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1' OFS=',' file file

结果:

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

<强>解释

我们将迭代文件两次

首先NR==FNR{a[$2]++;b[$7]++;next}

我们会重复每列,并将其存储在ab数组中。

第二次a[$2]>1 && b[$7]>1{$(NF+1)="shared"}1

要过滤与您期望的代表数匹配的行,对于这两个列,此数字必须大于1才能添加新的结尾列:$(NF+1)="shared"

注意:1只是避免使用print语句的快捷方式。

答案 1 :(得分:2)

请您试着跟随并告诉我这是否对您有帮助。

awk -F, 'FNR==NR{a[$2,$7]++;next}  a[$2,$7]>1{print $0",shared"}'  Input_file  Input_file

输出如下。

WPC PROD LINUX O,1808,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX O,1809,3145728000,10,3G,4G,66314,shared
WPC PROD LINUX O,1812,4194304000,10,3G,4G,66314,shared
WPC PROD LINUX,1808,4194304000,10,1D,2D,66314,shared
WPC PROD LINUX,1809,3145728000,10,1D,2D,66314,shared
WPC PROD LINUX,1812,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1808,4194304000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1809,3145728000,10,1D,2D,66314,shared
WPCESXCS40BP01_0,1812,4194304000,10,1D,2D,66314,shared

编辑: 如果您想打印带有“共享”字符串的匹配行,并且只需打印不匹配的行,则以下内容可能对您有所帮助。

awk -F, '           ##Creating field delimiter as comma.
FNR==NR{            ##FNR==NR is a condition which will be TRUE when first Input_file is being read.
  a[$2,$7]++;       ##creating an array named a whose index is $2,$7(second and 7th field) and incrementing its value with 1 each time same elements come.
  next              ##Using next keyword will skip all further statements.
}
a[$2,$7]>1{         ##This condition will be TRUE only when 2nd Input_file is being read, check if array a value in index of $2,$7 is greater than 1.
  print $0",shared" ##Printing the current line with keyword shared at last of line.
  next;
}
1
' Input_file Input_file ##Mentioning the Input_file twice here.