根据reg表达式对文件行进行排序

时间:2018-06-11 07:31:53

标签: regex linux sorting sed

我的测试文件中的行为

2018-05-28T17:13:08.024 {"operation":"INSERT","primaryKey":{"easy_id":1234},"subSystem":"ts\est","table":"tbl","timestamp":1527495188024}

我必须根据timestamp字段对行进行排序。我使用sed提取timestamp并尝试使用sed -e 's/((?<=\"timestamp\":)\d+.*?)/\1作为第一列。 任何人都可以帮助修复reg exp。

现在收到错误:sed: 1: "s/((?<=\"timestamp\":)\ ...": \1 not defined in the RE。我认为因为我的正则表达式而出现错误。

3 个答案:

答案 0 :(得分:1)

awk :此解决方案适用于timestamp可出现在任何地方的一般情况:

awk 'BEGIN {FPAT="\"timestamp\": *[0-9]*"; PROCINFO["sorted_ in"]="@ind_num_asc" }
     { a[substr($1,13)]=$0 }
     END { for(i in a) print a[i] }' <file>

这表明您的行包含"timestamp": nnnnnnnn形式的单个字段。它还假设所有数组都根据其键进行数字升序排序。第二部分从字段"timestamp":中删除$1部分,该字段现在是关键字并将其存储在数组中。最后,我们打印数组。

答案 1 :(得分:1)

您也可以使用gawk快速实施,而无需创建任何中间列等。

<强>命令:

awk -F'"timestamp":' '{a[substr($2,1,length($2)-1)]=$0}END{asorti(a,b);for(i in b){print a[b[i]]}}' input

<强>说明:

  • -F'"timestamp":'您将"timestamp":定义为字段分隔符
  • {a[substr($2,1,length($2)-1)]=$0}在文件的每一行上,您将时间戳值保存为索引,整行保存在关联数组中
  • END{asorti(a,b);for(i in b){print a[b[i]]}}在处理结束时,您对索引(时间戳)上的关联数组进行排序,并根据排序的索引打印数组的内容。

<强>输入

$ more input
2018-05-28T17:15:08.026 {"operation":"DELETE","primaryKey":{"easy_id":1236},"subSystem":"ts\est2","table":"tbl1","timestamp":1527495188026}
2018-05-28T17:13:08.024 {"operation":"INSERT","primaryKey":{"easy_id":1234},"subSystem":"ts\est","table":"tbl","timestamp":1527495188024}
2018-05-28T17:14:08.025 {"operation":"UPDATE","primaryKey":{"easy_id":1235},"subSystem":"ts\est1","table":"tbl1","timestamp":1527495188025}

<强>输出:

awk -F'"timestamp":' '{a[substr($2,1,length($2)-1)]=$0}END{asorti(a,b);for(i in b){print a[b[i]]}}' input                      
2018-05-28T17:13:08.024 {"operation":"INSERT","primaryKey":{"easy_id":1234},"subSystem":"ts\est","table":"tbl","timestamp":1527495188024}
2018-05-28T17:14:08.025 {"operation":"UPDATE","primaryKey":{"easy_id":1235},"subSystem":"ts\est1","table":"tbl1","timestamp":1527495188025}
2018-05-28T17:15:08.026 {"operation":"DELETE","primaryKey":{"easy_id":1236},"subSystem":"ts\est2","table":"tbl1","timestamp":1527495188026}

答案 2 :(得分:1)

您可以使用sort命令:

sort -t: -k8 inputfile

此处-t:允许冒号:为分隔符。排序由八个字段完成,因为timestamp":中的冒号是该行中的八个冒号。