我有一个大文件(超过1000行),我需要按一些标准对其进行排序。 文件包含如下行:
bla bla bla took 536ms. {"uniqueId":"ygfwyagf","duration":536} []
bla took 531ms. {"uniqueId":"wdagweg","duration":531} []
[2017-07-26 11:34:04.346533] wgwqegwqeg took 47ms. {qwgwqgce":"local","duration":47} []
[2017-07-2 [bla] Aocal took 41ms. {"uniagwrqgwqrwqg ation":41} []
[2017-07-26 1wergwg local took 39ms. {"uniqueId"wetgwgweqg gg}
需要在“take”之后用数字对它们进行排序
使用awk我可以通过以下方式对它们进行排序:awk '{for(i=1;i<=NF;i++) if ($i=="took") print $(i+1)}' test | sort -h
但对于输出,我需要从所有行,只需要排序而不会丢失任何东西。不幸的是,mss不在同一列上(很容易)。
如果比原生bash解决方案更好(更快/更简单/更正确),将接受需要调用另一个解释器(perl,python等)的解决方案。
答案 0 :(得分:3)
执行此操作的简单方法是将要搜索的数据提取到列中,对其进行排序,然后在另一个管道元素中删除该列。
因此,作为一个直接的步骤:
gawk 'match($0, /took ([[:digit:]]+)/, m) {printf("%s\t%s\n", m[1], $0)}'
这将使您的信息流看起来像:
536 bla bla bla took 536ms. {"uniqueId":"ygfwyagf","duration":536} []
531 bla took 531ms. {"uniqueId":"wdagweg","duration":531} []
47 [2017-07-26 11:34:04.346533] wgwqegwqeg took 47ms. {qwgwqgce":"local","duration":47} []
41 [2017-07-2 [bla] Aocal took 41ms. {"uniagwrqgwqrwqg ation":41} []
39 [2017-07-26 1wergwg local took 39ms. {"uniqueId"wetgwgweqg gg}
...此时您可以通过sort -n
传递它以对开头的数字进行排序,然后对去除该主要值的管道元素进行排序:
gawk 'match($0, /took ([[:digit:]]+)/, m) {printf("%s\t%s\n", m[1], $0)}' \
| sort -n | cut -d $'\t' -f 2-
......我们有输出:
[2017-07-26 1wergwg local took 39ms. {"uniqueId"wetgwgweqg gg}
[2017-07-2 [bla] Aocal took 41ms. {"uniagwrqgwqrwqg ation":41} []
[2017-07-26 11:34:04.346533] wgwqegwqeg took 47ms. {qwgwqgce":"local","duration":47} []
bla took 531ms. {"uniqueId":"wdagweg","duration":531} []
bla bla bla took 536ms. {"uniqueId":"ygfwyagf","duration":536} []
答案 1 :(得分:2)
使用Perl,你可以写
perl -e '
while (<>) {
if (/took (\d+)/) {
push @{$lines{$1}}, $_;
}
}
for $num (sort {$a <=> $b} keys %lines) {
print join("", @{$lines{$num}});
}
' file
或,作为线路噪音
perl -lnE'/took (\d+)/&&push@{$l{$1}},$_}END{say@{$l{$_}}for sort{$a<=>$b}keys%l' file
答案 2 :(得分:1)
"ZR°p"
作为替代方案,更简洁的使用gawk的方法是使用时间戳作为数组的索引(tim),然后使用函数asorti将索引排序到另一个数组(tim1),tim1中的排序索引是然后用于提取数据。
输出:
for InItem in Input:
if not any(AlItem in InItem for AlItem in alpha+digit+punct):