Question

我有一个三列文件，我想找到第三列的最大值，其中第一行具有相同的行，并且输出中也包含第二列。

输入：

1   234   0.005
1   235   0.060
1   236   0.001
2   234   0.010
2   235   0.003
2   236   0.003
3   234   0.004
3   235   0.100
3   236   0.004

所需的输出：

1   235   0.060
2   234   0.010
3   235   0.100

我从以前的问题中找到了这个提示，但我也不知道第二栏：

!($1 in max) || $3>max[$1] { max[$1] = $3 }
END {
     PROCINFO["sorted_in"] = "@ind_num_asc"
     for (key in max) {
         print key, max[key]
         }
     }

Answer 1

您可以使用此awk：

awk '!($1 in max) || $3 > max[$1] { max[$1] = $3; two[$1] = $2 }
END { PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in max) print i, two[i], max[i]
}' file

1 235 0.060
2 234 0.010
3 235 0.100

Answer 2

$ sort -k1n -k3nr file | uniq -w 1
1   235   0.060
2   234   0.010
3   235   0.100

使用sort对字段1和3、3反向排序。然后使用uniq并仅比较第一个字符。

另一种使用GNU awk的软件：

$ awk '{
    a[$1][$3]=$0 }
END {   
    PROCINFO["sorted_in"]="@ind_num_asc"       # first for in ascending order
    for(i in a) {
        PROCINFO["sorted_in"]="@ind_num_desc"  # next for in descending
        for(j in a[i]) {
            print a[i][j]
            break
        }
    }
}' file
1   235   0.060
2   234   0.010
3   235   0.100

Answer 3

能否请您尝试以下。应该以与Input_file的输入顺序相同的顺序给出输出。

awk '
!a[$1]++{
  b[++count]=$1
}
{
  c[$1]=(c[$1]>$NF?c[$1]:$NF)
  d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)
}
END{
  for(i=1;i<=count;i++){
    print d[b[i]],c[b[i]]
  }
}'  Input_file

输出如下。

1 235 0.060
2 234 0.010
3 235 0.100

说明： 也在此处添加了上述代码的说明。

awk '
!a[$1]++{                              ##Checking condition if array named a has NO occurrence of $1 in it then do following.
  b[++count]=$1                        ##Create array b whose index is variable count with increasing value of 1 each time value is $1 for it.
}
{
  c[$1]=(c[$1]>$NF?c[$1]:$NF)          ##Creating array c value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
  d[$1]=(c[$1]>$NF?d[$1]:$1 OFS $2)    ##Creating array d value index is $1 and checking if $NF value is greater then its value then change it to $NF else no change.
}
END{                                   ##Starting end block of awk program here.
  for(i=1;i<=count;i++){               ##Starting for loop here from i value 1 to till value of count.
    print d[b[i]],c[b[i]]              ##Printing value of array d whose index is value of b[i] and array c whose index is b[i].
  }
}' Input_file                          ##mentioning Input_file name here.

Answer 4

这应该在任何现代awk（不仅是GNU）中都有效：

$ awk '!a[$1]||$3>b[$1]{a[$1]=$0;b[$1]=$3} END {for(i in a)print a[i]}' file | sort -n

残破以便于阅读：

!a[$1] || $3>b[$1]-如果在第一列之前我们还没有看到，或者第三列超过了我们之前的记录，
{a[$1]=$0;b[$1]=$3}-然后将当前行存储在一个数组中，将比较值存储在另一个数组中。
END {for(i in a)print a[i]}-处理完所有输入后，请打印存储阵列中的每一行。
sort -n-按数字排序。应该与任何类型的sort一起使用。

像泥一样清晰吗？

此解决方案特别是存储整行（$0）而不是各个字段的内容，因此其输出将成为输入行，而不是 recreate 输入线。如果您愿意为了收集字段进行比较而对默认字段拆分感到满意，那么这可能会很有用，但是您有希望将输出匹配的列化或选项卡式输入。

Answer 5

$ sort -k1,1n -k3,3nr file | awk '!seen[$1]++'
1   235   0.060
2   234   0.010
3   235   0.100

在bash中第一列的每个不同值中找到第n列的最大值

5 个答案: