多列的列排序

时间:2013-08-07 20:03:09

标签: python sorting ubuntu

我收到的输出如下:

2013-08-05-Mon 10:17:00 type1   0.190476190476
2013-08-05-Mon 10:17:00 type1   0
2013-08-05-Mon 10:17:00 type2   0.1
2013-08-05-Mon 10:17:00 type2   -0.2

要获得此输出,我发送head -3 Tweets/FlumeData.txt | python sentimentMapper

要对它们进行排序,head -3 Tweets/FlumeData.txt | python sentimentMapper | sort -k3`

这是当前按第三列排序数据,所有type1,然后是type2。理想情况下,我想按字母顺序对数据进行排序,然后按数字排序(换句话说,将所有type1从最低值到最高值,然后将所有type2从最低值到最高值。)

我尝试过:sort -k3 -k4n但无济于事。我该如何解决这个问题?

编辑:理想输出:

2013-08-05-Mon 10:17:00 type1   0
2013-08-05-Mon 10:17:00 type1   0.190476190476
2013-08-05-Mon 10:17:00 type2   -0.2
2013-08-05-Mon 10:17:00 type2   0.1

2 个答案:

答案 0 :(得分:1)

试试这个:

LANG=C sort -k3,3 -k4,4n file

来自info coreutils 'sort invocation'

`-k POS1[,POS2]'
`--key=POS1[,POS2]'
     Specify a sort field that consists of the part of the line between
     POS1 and POS2 (or the end of the line, if POS2 is omitted),
     _inclusive_.

     Each POS has the form `F[.C][OPTS]', where F is the number of the
     field to use, and C is the number of the first character from the
     beginning of the field.  Fields and character positions are
     numbered starting with 1; a character position of zero in POS2
     indicates the field's last character.  If `.C' is omitted from
     POS1, it defaults to 1 (the beginning of the field); if omitted
     from POS2, it defaults to 0 (the end of the field).  OPTS are
     ordering options, allowing individual keys to be sorted according
     to different rules; see below for details.  Keys can span multiple
     fields.

     Example:  To sort on the second field, use `--key=2,2' (`-k 2,2').
     See below for more notes on keys and more examples.  See also the
     `--debug' option to help determine the part of the line being used
     in the sort.

对于LANG=C

   (1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to.  In that case, set the `LC_ALL' environment
variable to `C'.  Note that setting only `LC_COLLATE' has two problems.
First, it is ineffective if `LC_ALL' is also set.  Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
set to an incompatible value.  For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.

你也可以看一下这篇文章:https://stackoverflow.com/a/5868546/465183

答案 1 :(得分:0)

-k3选项按一个字段排序,该字段定义为“从第二个字段后的第一个空格字符开始,到行尾结束”,这可能不是您想要的。您可能想要的是:

sort -n -k3,3 -k4,4 file

添加sputnik提到的LANG=C位可能也很有用。