我正在尝试基于倒数第二列为整列打印最后一列的最大值-
输入文件:file1.txt
2019-01-16 08:00:00.0 test1 28848859233
2019-01-16 08:00:00.0 test2 902006478
2019-01-16 08:00:00.0 test3 5385892905
2019-01-16 08:00:00.0 test1 4194204503
2019-01-15 08:00:00.0 test1 115598553821
2019-01-15 08:00:00.0 test2 59736397346
2019-01-15 08:00:00.0 test3 5508381147
2019-01-15 08:00:00.0 test4 39377518945
2019-01-15 08:00:00.0 test5 35371907528
2019-01-14 08:00:00.0 test1 115598553811
2019-01-14 08:00:00.0 test3 5408381147
2019-01-14 08:00:00.0 test4 346377518945
预期的输出-
2019-01-15 08:00:00.0 test1 115598553821
2019-01-15 08:00:00.0 test2 59736397346
2019-01-15 08:00:00.0 test3 5508381147
2019-01-14 08:00:00.0 test4 346377518945
2019-01-15 08:00:00.0 test5 35371907528
当我尝试将其用于驱动列(3)和所需列(4)的最大值时,它起作用
awk '{if (a[$3] < $4) {a[$3]=$4}} END {PROCINFO["sorted_in"] = "@ind_num_asc" ; for (i in a) {print i, a[i]}}' file1.txt
test1 115598553821
test2 59736397346
test3 5508381147
test4 346377518945
test5 35371907528
我在下面的命令中尝试打印整个行,但是没有用-
awk '{if (a[$3] < $4) {a[$3]=$4;b[$0]=a[$3]}} END {PROCINFO["sorted_in"] = "@ind_num_asc" ;for (i in b) {print i, b[i]}}' file1.txt
2019-01-15 08:00:00.0 test4 39377518945 39377518945
2019-01-15 08:00:00.0 test2 59736397346 59736397346
2019-01-15 08:00:00.0 test3 5508381147 5508381147
2019-01-16 08:00:00.0 test2 902006478 902006478
2019-01-14 08:00:00.0 test4 346377518945 346377518945
2019-01-15 08:00:00.0 test5 35371907528 35371907528
2019-01-15 08:00:00.0 test1 115598553821 115598553821
2019-01-16 08:00:00.0 test3 5385892905 5385892905
2019-01-16 08:00:00.0 test1 28848859233 28848859233
答案 0 :(得分:1)
第一种解决方案: 。能否请您尝试以下操作。
awk '
{
a[$3]=$NF>a[$3]?$NF:a[$3]
b[$3,$NF]=$1 OFS $2
}
END{
for(i in a){
print b[i,a[i]],i,a[i]
}
}' Input_file
第二个解决方案: 以下内容将照顾到$ 3(第三个字段)的输出顺序与每个Input_file的第三个字段顺序相同。
awk '
!c[$3]++{
d[++count]=$3
}
{
a[$3]=$NF>a[$3]?$NF:a[$3]
b[$3,$NF]=$1 OFS $2
}
END{
for(i=1;i<=count;i++){
print b[d[i],a[d[i]]],d[i],a[d[i]]
}
}' Input_file
上述代码的解释:
awk '
!c[$3]++{ ##Checking condition if array c with index $3 of current line is coming first time in array c if this is TRUE then assign it $3 as an index current line.
d[++count]=$3 ##Creating an aray d whose index as count variable value which will increment each time cursor comes here and assigning value of this array d to $3 here.
} ##Closing block for array c here.
{ ##Starting block which will execute in all the lines for Input_file.
a[$3]=$NF>a[$3]?$NF:a[$3] ##Creating an array named a whose value is $NF of current line if value of $NF>a[$3] else it is NOT changing.
b[$3,$NF]=$1 OFS $2 ##Creating an array b whose index is $3,$NF and value will be $1 OFS $2.
} ##Closing block here.
END{ ##Starting END block of awk program here.
for(i=1;i<=count;i++){ ##Starting a for loop from i=1 to till value of count here.
print b[d[i],a[d[i]]],d[i],a[d[i]] ##Printing value of array b whose index is d[i], array a whose index is d[i] value AND value of d[i].
} ##Closing block for, for loop now.
}' Input_file ##Mentioning Input_file name here.
编辑: :添加了OP尝试不起作用的原因。
OP的代码:
awk '{if (a[$3] < $4) {a[$3]=$4;b[$0]=a[$3]}} END {PROCINFO["sorted_in"] = "@ind_num_asc" ;for (i in b) {print i, b[i]}}' file1.txt
解释恕我直言,为什么代码不起作用: :由于数组b的值永远不会被删除或更改(每当第三列的值小于或大于其先前值时)因此,这就是您遍历数组b
然后打印出数组b的所有值的原因。每当第三个字段的值小于其先前值时,我们就需要更改数组b的值。
答案 1 :(得分:1)
请尝试以下操作:
$ awk '!n[$3] || n[$3]<$4{n[$3]=$4;l[$3]=$0;}END{for(i in l) print l[i]}' file1.txt
2019-01-15 08:00:00.0 test1 115598553821
2019-01-15 08:00:00.0 test2 59736397346
2019-01-15 08:00:00.0 test3 5508381147
2019-01-14 08:00:00.0 test4 346377518945
2019-01-15 08:00:00.0 test5 35371907528
为了简洁和有效,我将条件移到了外面。
另外,我将key
的值更改为$3
,其中您将整行用作键($0
)。
由于您要输出整行,因此它们应该是值,第3列的值应该是键。
答案 2 :(得分:1)
使用始终方便的GNU datamash的非高级解决方案:
$ datamash -Wsf groupby 3 max 4 < example.txt | cut -f 1-4
2019-01-15 08:00:00.0 test1 115598553821
2019-01-15 08:00:00.0 test2 59736397346
2019-01-15 08:00:00.0 test3 5508381147
2019-01-14 08:00:00.0 test4 346377518945
2019-01-15 08:00:00.0 test5 35371907528
答案 3 :(得分:1)
与sort/awk
合作
$ sort -k3,3 -k4nr file | awk '!a[$3]++'
2019-01-15 08:00:00.0 test1 115598553821
2019-01-15 08:00:00.0 test2 59736397346
2019-01-15 08:00:00.0 test3 5508381147
2019-01-14 08:00:00.0 test4 346377518945
2019-01-15 08:00:00.0 test5 35371907528
答案 4 :(得分:0)
我发现了问题,当满足所需条件时(应该将$0
的最大值保存到驱动列(3)a中,我应该将$NF
存储在数组b的驱动列(3)中[$ 3] = $ 4)不能将整行的column(3)的数组a放入数组b。像这样-
awk '{if (a[$3] < $4) {a[$3]=$4;b[$3]=$0}} END {PROCINFO["sorted_in"] = "@ind_num_asc" ;for (i in b) {print b[i]}}' file1.txt
2019-01-15 08:00:00.0 test1 115598553821
2019-01-15 08:00:00.0 test2 59736397346
2019-01-15 08:00:00.0 test3 5508381147
2019-01-14 08:00:00.0 test4 346377518945
2019-01-15 08:00:00.0 test5 35371907528