我的数据看起来像这样(我在预期要删除的索引的末尾添加了“ d”),但是原始文件不一定要排序: PPBondedFieldOver2NeedsFixing.csv:
ABR: 1-1-1-41,2298961,578766
ABRd: 1-1-1-42,9109441,1581024
ABRd: 1-1-1-45,9109441,1581024
ABRd: 1-1-1-46,9109441,1581024
ABRd: 1-1-1-43,9109442,10612609
ABRd: 1-1-1-43,9109442,10612609
ABRd: 1-1-1-44,9109442,10612609
ABRd: 1-1-1-45,9109443,14210513
ABRd: 1-1-1-46,9109443,14210513
ABRd: 1-1-1-47,9109443,14210513
ABR: 1-1-1-45,9109444,14210513
ABR: 1-1-1-46,9109444,14210513
ABR: 1-1-2-23,9109445,1761077
ABR: 1-1-2-24,9109445,1761077
我试图找到一个shell命令,可以用来删除逗号之间的部分计数超过2的那些命令。稍后,我将有一个类似的文件,其中删除计数超过1的那些命令。
我正在尝试获取超过2的列表,但这给了我很长的空白输出:
cut -d "," -f 2 PPBondedFieldOver2NeedsFixing.csv | sort | uniq | gawk '$1>2{print $2}'
为此,我尝试遵循list of ip's occurring more than 3 times,但这是不同的,因为它们只有一列。
我希望我的输出最终看起来像这样:
ABR: 1-1-1-41,2298961,578766
ABR: 1-1-1-45,9109444,14210513
ABR: 1-1-1-46,9109444,14210513
ABR: 1-1-2-23,9109445,1761077
ABR: 1-1-2-24,9109445,1761077
我出现3次以上的列表如下:
9109441
9109442
9109443
答案 0 :(得分:1)
这可以使用单个awk
完成,而无需调用多个昂贵的命令,例如sort
:
awk -F, 'FNR == NR { counts[$2]++; next }
counts[$2] > 2 && !seen[$2]++{print $2 > "tmpFile"}
counts[$2] <= 2' PPBondedFieldOver2NeedsFixing.csv{,}
ABR: 1-1-1-41,2298961,578766
ABR: 1-1-1-45,9109444,14210513
ABR: 1-1-1-46,9109444,14210513
ABR: 1-1-2-23,9109445,1761077
ABR: 1-1-2-24,9109445,1761077
cat tmpFile
9109441
9109442
9109443
答案 1 :(得分:0)
这是我如何使其工作的列表出现两次以上的方法:
cut -d "," -f 2 PPBondedFieldOver2NeedsFixing.csv | sort | awk '++A[$1]>2'
这将返回
9109441
9109442
9109443