我收到的输出如下:
2013-08-05-Mon 10:17:00 type1 0.190476190476
2013-08-05-Mon 10:17:00 type1 0
2013-08-05-Mon 10:17:00 type2 0.1
2013-08-05-Mon 10:17:00 type2 -0.2
要获得此输出,我发送head -3 Tweets/FlumeData.txt | python sentimentMapper
要对它们进行排序,head -3 Tweets/FlumeData.txt | python sentimentMapper
| sort -k3`
这是当前按第三列排序数据,所有type1
,然后是type2
。理想情况下,我想按字母顺序对数据进行排序,然后按数字排序(换句话说,将所有type1
从最低值到最高值,然后将所有type2
从最低值到最高值。)
我尝试过:sort -k3 -k4n
但无济于事。我该如何解决这个问题?
编辑:理想输出:
2013-08-05-Mon 10:17:00 type1 0
2013-08-05-Mon 10:17:00 type1 0.190476190476
2013-08-05-Mon 10:17:00 type2 -0.2
2013-08-05-Mon 10:17:00 type2 0.1
答案 0 :(得分:1)
试试这个:
LANG=C sort -k3,3 -k4,4n file
来自info coreutils 'sort invocation'
:
`-k POS1[,POS2]'
`--key=POS1[,POS2]'
Specify a sort field that consists of the part of the line between
POS1 and POS2 (or the end of the line, if POS2 is omitted),
_inclusive_.
Each POS has the form `F[.C][OPTS]', where F is the number of the
field to use, and C is the number of the first character from the
beginning of the field. Fields and character positions are
numbered starting with 1; a character position of zero in POS2
indicates the field's last character. If `.C' is omitted from
POS1, it defaults to 1 (the beginning of the field); if omitted
from POS2, it defaults to 0 (the end of the field). OPTS are
ordering options, allowing individual keys to be sorted according
to different rules; see below for details. Keys can span multiple
fields.
Example: To sort on the second field, use `--key=2,2' (`-k 2,2').
See below for more notes on keys and more examples. See also the
`--debug' option to help determine the part of the line being used
in the sort.
对于LANG=C
:
(1) If you use a non-POSIX locale (e.g., by setting `LC_ALL' to
`en_US'), then `sort' may produce output that is sorted differently
than you're accustomed to. In that case, set the `LC_ALL' environment
variable to `C'. Note that setting only `LC_COLLATE' has two problems.
First, it is ineffective if `LC_ALL' is also set. Second, it has
undefined behavior if `LC_CTYPE' (or `LANG', if `LC_CTYPE' is unset) is
set to an incompatible value. For example, you get undefined behavior
if `LC_CTYPE' is `ja_JP.PCK' but `LC_COLLATE' is `en_US.UTF-8'.
你也可以看一下这篇文章:https://stackoverflow.com/a/5868546/465183
答案 1 :(得分:0)
-k3
选项按一个字段排序,该字段定义为“从第二个字段后的第一个空格字符开始,到行尾结束”,这可能不是您想要的。您可能想要的是:
sort -n -k3,3 -k4,4 file
添加sputnik提到的LANG=C
位可能也很有用。