Question

我有3列

a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z

预期输出

a 10 x
b 20 w
c 12 z

即。我需要对第2列进行排序，但不更改第1列的顺序然后基于第二列

grep列表中最大值的行

Answer 1

两种方法（选择一种你喜欢的方式）：

1 ）排序 + uniq “技巧”：

sort -k1,1 -k2,2rn file | uniq -w1

-k1,1 - 按第一阶段的第一个字段排序
-k2,2rn - 按相反的顺序按数字排序第二个字段
uniq -w1 - 输出行中不超过1字符的唯一行（可以调整-w<number>）

输出：

a 10 x
b 20 w
c 12 z

2 ）只需使用GNU datamash 工具：

datamash -Wsf -g1 max 2 <file | cut -f1-3

输出：

a   10  x
b   20  w
c   12  z

Answer 2

<强> 输入

$ cat infile
a 03 w
a 10 x
a 01 y
b 20 w
b 01 x
c 02 w
c 10 y
c 12 z

<强> 输出

$ awk -F'[[:blank:]]' '{f=($1 in b)}f && b[$1]<$2 || !f{a[$1]=$0;b[$1]=$2}END{for(i in a)print a[i]}' infile
a 10 x
b 20 w
c 12 z

更好的可读性

awk -F'[[:blank:]]' '
                     {
                       f=($1 in b)
                     }
                     f && b[$1]<$2 || !f{
                        a[$1]=$0;
                        b[$1]=$2
                     }
                  END{
                        for(i in a)
                            print a[i]
                     }
                    ' infile

<强> 解释

-F'[[:blank:]]' - 设置输入字段分隔符
f=($1 in b) - 变量f保持布尔状态（true=1/false=0），具体取决于数组{{1}中是否存在索引/数组键（$1） }
b如果f && b[$1]<$2 || !f为true且数组（f）值小于（b[$1]）当前行/记录/行＆＃39;列值，或（< $2）||意味着数组没有我们查找的键

!f

a[$1]=$0;数组（a）包含整行/行/记录（$1）< / p>
$0数组（b[$1]=$2），索引键为当前行的第一列（b），保存第二个字段值（$1）
$2 END阻止循环遍历数组END { for(i in a) print a[i] }并打印数组值。

注意：请相应修改a，以匹配您的文件字段分隔符

Answer 3

您可以使用UNIX命令sort和awk：

sort -k1,1 -k2,2nr file | awk '!seen[$1]++'

将它们应用于vim中的缓冲区：

:!%sort -k1,1 -k2,2nr | awk '\!seen[$1]++'

说明：

sort命令将输入到级别，首先是第1列，然后是第2列。这将为您提供以下中间输出：

a 10 x
a 03 w
a 01 y
b 20 w
b 01 x
c 12 z
c 10 y
c 02 w

我们将它传递给一个小awk脚本，该脚本维护一个由列1索引的数组变量seen。由于逻辑被!还原，一旦我们之前见过第1列，我们不会再打印出来了：

a 10 x  <-- print
a 03 w
a 01 y
b 20 w  <-- print
b 01 x
c 12 z  <-- print
c 10 y
c 02 w

Answer 4

尝试一次。

awk '
{
  b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;
  a[$1]=a[$1]>$2?a[$1]:$2;
}
END{
  for(i in a){
     print b[i]
}
}
'   Input_file

<强>解释

awk '
{                                    ##Starting block here.
  b[$1]=a[$1]>$2?(b[$1]?b[$1]:$0):$0;##creating an array named b whose index is $1, then checking if array a with index $1 value is greater than $2 or not, if yes then assign b[$1] to b[$1] else change it to current line. This is to make sure always we should get the line whose $2 value is greater than its previous value with respect to $1.
  a[$1]=a[$1]>$2?a[$1]:$2; ##creating an array named a whose index is $1 and checking if value of a[$1] is greater than $2 is yes then keep a[$1] value as it is else change its value to current line value.
}
END{                       ##Starting END block of awk here.
  for(i in a){             ##Starting a for loop to traverse inside array a elements.
     print b[i]            ##Because array a and array b have same indexes and we have to print whole lines values so printing array b value here.
}
}
'  Input_file              ##mentioning the Input_file here.

如何按以下要求对列进行排序

4 个答案: