我的测试文件中的行为
2018-05-28T17:13:08.024 {"operation":"INSERT","primaryKey":{"easy_id":1234},"subSystem":"ts\est","table":"tbl","timestamp":1527495188024}
我必须根据timestamp
字段对行进行排序。我使用sed
提取timestamp
并尝试使用sed -e 's/((?<=\"timestamp\":)\d+.*?)/\1
作为第一列。
任何人都可以帮助修复reg exp。
现在收到错误:sed: 1: "s/((?<=\"timestamp\":)\ ...": \1 not defined in the RE
。我认为因为我的正则表达式而出现错误。
答案 0 :(得分:1)
awk
:此解决方案适用于timestamp
可出现在任何地方的一般情况:
awk 'BEGIN {FPAT="\"timestamp\": *[0-9]*"; PROCINFO["sorted_ in"]="@ind_num_asc" }
{ a[substr($1,13)]=$0 }
END { for(i in a) print a[i] }' <file>
这表明您的行包含"timestamp": nnnnnnnn
形式的单个字段。它还假设所有数组都根据其键进行数字升序排序。第二部分从字段"timestamp":
中删除$1
部分,该字段现在是关键字并将其存储在数组中。最后,我们打印数组。
答案 1 :(得分:1)
您也可以使用gawk
快速实施,而无需创建任何中间列等。
<强>命令:强>
awk -F'"timestamp":' '{a[substr($2,1,length($2)-1)]=$0}END{asorti(a,b);for(i in b){print a[b[i]]}}' input
<强>说明:强>
-F'"timestamp":'
您将"timestamp":
定义为字段分隔符{a[substr($2,1,length($2)-1)]=$0}
在文件的每一行上,您将时间戳值保存为索引,整行保存在关联数组中END{asorti(a,b);for(i in b){print a[b[i]]}}
在处理结束时,您对索引(时间戳)上的关联数组进行排序,并根据排序的索引打印数组的内容。 <强>输入强>
$ more input
2018-05-28T17:15:08.026 {"operation":"DELETE","primaryKey":{"easy_id":1236},"subSystem":"ts\est2","table":"tbl1","timestamp":1527495188026}
2018-05-28T17:13:08.024 {"operation":"INSERT","primaryKey":{"easy_id":1234},"subSystem":"ts\est","table":"tbl","timestamp":1527495188024}
2018-05-28T17:14:08.025 {"operation":"UPDATE","primaryKey":{"easy_id":1235},"subSystem":"ts\est1","table":"tbl1","timestamp":1527495188025}
<强>输出:强>
awk -F'"timestamp":' '{a[substr($2,1,length($2)-1)]=$0}END{asorti(a,b);for(i in b){print a[b[i]]}}' input
2018-05-28T17:13:08.024 {"operation":"INSERT","primaryKey":{"easy_id":1234},"subSystem":"ts\est","table":"tbl","timestamp":1527495188024}
2018-05-28T17:14:08.025 {"operation":"UPDATE","primaryKey":{"easy_id":1235},"subSystem":"ts\est1","table":"tbl1","timestamp":1527495188025}
2018-05-28T17:15:08.026 {"operation":"DELETE","primaryKey":{"easy_id":1236},"subSystem":"ts\est2","table":"tbl1","timestamp":1527495188026}
答案 2 :(得分:1)
您可以使用sort命令:
sort -t: -k8 inputfile
此处-t:
允许冒号:
为分隔符。排序由八个字段完成,因为timestamp":
中的冒号是该行中的八个冒号。