基于具有文本和数字的列对文件的行进行排序

时间:2015-12-11 11:35:50

标签: sorting awk sed

我有一个包含大量行的文件。这些行在末尾附加了文本。现在我希望这些行按递增顺序排序。

示例:

I have {few_1} lines here like this and so on
I have {few_101} lines here like this and so on
I have {few_21} lines here like this and so on
I have {few_11} lines here like this and so on
I have {few_31} lines here like this and so on
I have {few_41} lines here like this and so on
I have {few_51} lines here like this and so on

我需要文件看起来像这样:

I have {few_1} lines here like this and so on
I have {few_11} lines here like this and so on
I have {few_21} lines here like this and so on
I have {few_31} lines here like this and so on
I have {few_41} lines here like this and so on
I have {few_51} lines here like this and so on
I have {few_101} lines here like this and so on

我试过这个,但这没有按预期工作。

sort -k7,7 -n filename

非常感谢任何帮助。

4 个答案:

答案 0 :(得分:2)

您可以告诉sort使用密钥编号后面的.n表示法跳过字段中的字符。

我预计-k7.5n将是正确的密钥,因为数字似乎从第5位开始。可以排序也计算默认字段分隔符的空间。

这也假设您的数据与样本一样规则,并且字段7总是在#部分之前有一个字符长度为4个字符。如果更改,则必须预处理文件。这将是S.O的单独Q.

sort -k7.6n file

<强>输出

I have few lines here like this1 and so on
I have few lines here like this11 and so on
I have few lines here like this21 and so on
I have few lines here like this31 and so on
I have few lines here like this41 and so on
I have few lines here like this51 and so on
I have few lines here like this101 and so on

IHTH

答案 1 :(得分:1)

另一种方法:

sort -nk2 -t_ file

这会将下划线分开,并对第二列进行数字排序。

答案 2 :(得分:1)

要做到这一点,不管每行上出现的其他文字是什么:

1)在前面添加要隔离的字符串{<non-close-brace>_<number>}中的数字以用于排序:

$ sed -r 's/.*\{[^}]+_([0-9]+)\}.*/\1\t&/' file
1       I have {few_1} lines here like this and so on
101     I have {few_101} lines here like this and so on
21      I have {few_21} lines here like this and so on
11      I have {few_11} lines here like this and so on
31      I have {few_31} lines here like this and so on
41      I have {few_41} lines here like this and so on
51      I have {few_51} lines here like this and so on

2)排序:

$ sed -r 's/.*\{[^}]+_([0-9]+)\}.*/\1\t&/' file | sort -n
1       I have {few_1} lines here like this and so on
11      I have {few_11} lines here like this and so on
21      I have {few_21} lines here like this and so on
31      I have {few_31} lines here like this and so on
41      I have {few_41} lines here like this and so on
51      I have {few_51} lines here like this and so on
101     I have {few_101} lines here like this and so on

3)删除您在步骤1中添加的号码:

$ sed -r 's/.*\{[^}]+_([0-9]+)\}.*/\1\t&/' file | sort -n | cut -f2-
I have {few_1} lines here like this and so on
I have {few_11} lines here like this and so on
I have {few_21} lines here like this and so on
I have {few_31} lines here like this and so on
I have {few_41} lines here like this and so on
I have {few_51} lines here like this and so on
I have {few_101} lines here like this and so on

这是解决各种排序问题的一种非常常见的方法。

答案 3 :(得分:1)

为什么这对你不起作用?对于排序子字段索引,您需要设置-b选项以忽略前导空格。这将从那个键开始排序,可能是你想要的。

$ sort -k3.6bn file

I have {few_1} lines here like this and so on
I have {few_11} lines here like this and so on
I have {few_21} lines here like this and so on
I have {few_31} lines here like this and so on
I have {few_41} lines here like this and so on
I have {few_51} lines here like this and so on
I have {few_101} lines here like this and so on