Question

我有一个以制表符分隔的文件，我需要按第一个字段的长度排序。我找到了一行应该为我做的那些样本，但它给出了非常奇怪的结果：

awk -F\t '{print length($1) " " $0|"sort -rn"}' SpanishGlossary.utf8 | sed 's/^.[^>]*>/>/' > test.tmp

...给出了这个（几个有代表性的样本 - 这是一个非常长的文件）：

56 cafés especiales y orgánicos special and organic coffees
56 amplia experiencia gerencial broad managerial experience
55 una fundada confianza en que a well-founded confidence that
55 Servicios de Desarrollo Empresarial  Business Development Services
...
6 son estas are these
6 son entregadas a  are given to
6 son determinantes para    are crucial for
6 son autolimitativos   are self-limiting
...
0 tal grado de  such a degree of
0 tales such
0 tales propósitos  such purposes
0 tales principios  such principles
0 tales o cuales    this or that

前导数字应该是第一个字段的长度，但显然不是。我不知道正在计算什么。

我做错了什么？感谢。

Answer 1

试试这个：

awk '$0=length($1) FS $0' file | sort -nr | sed -r 's/^\S*\s//'

试验：

kent$  cat f
as foo
a foo
aaa foo
aaaaa foo
aaaa foo

kent$  awk '$0=length($1) FS $0' f|sort -nr|sed -r 's/^\S*\s//'
aaaaa foo
aaaa foo
aaa foo
as foo
a foo

这里我使用空格（默认）作为awk的FS，如果您需要tab，请添加-F'\t'

修改

为@Jaypal添加一个awk（gnu awk）只有一个内容，

我提到了gawk，因为它有asort和asorti，我们可以用它来进行排序。

我也改变了输入文件以添加一些相同长度（$1）的行。

"@val_num_asc"

中更好desc或asorti(a,b,"...")

kent$  cat f
as foo
a foo
aaa foo
ccc foo
aaaaa foo
bbbbb foo
aaaa foo

kent$  awk '{a[length($1)"."NR]=$0}END{asorti(a,b);for(i=NR;i>0;i--)print a[b[i]]}' f
bbbbb foo
aaaaa foo
aaaa foo
ccc foo
aaa foo
as foo
a foo

用awk按字段长度排序......不工作

1 个答案:

修改