将模式替换为bash中同一行的匹配字符串的一部分

时间:2017-05-10 15:31:02

标签: json awk sed stream jq

我有一个文件,每行有一个json,格式如下:

{"id":13, "url":"https://sub.domain.com/path", "dm":"-", "ip":"192.168.0.1"}
{"id":14, "url":"sub.domain2.com/?param=value", "dm":"-", "ip":"192.168.0.1"}
{"id":15, "url":"domain.com/path", "dm":"prefilled.com", "ip":"192.168.0.1"}

我需要更换" dm":" - "使用同一行中的相应域来获取此输出:

{"id":13, "url":"https://sub.domain.com/path", "dm":"sub.domain.com", "ip":"192.168.0.1"}
{"id":14, "url":"sub.domain2.com/?param=value", "dm":"sub.domain2.com", "ip":"192.168.0.1"}
{"id":15, "url":"domain.com/path", "dm":"prefilled.com", "ip":"192.168.0.1"}

任何bash命令仅适用于具有" dm":" - "以优化的方式,因为文件长度超过10k行

3 个答案:

答案 0 :(得分:4)

使用jq-1.5(最新版本的atm),您可以执行以下操作:

jq 'if .dm == "-" then .dm = (.url|sub("https?://";"")|sub("/.*";"")) else . end' a.json

说明:

if .dm == "-" ...           # Runs the following only if .dm exists and it's value is "-"
.dm=(...)                   # Assigns to .dm
.url|sub("^https?://"; "")  # Takes .url and replaces http/https:// from the beginning
...|sub("/.*"; "")          # Replaces everything after the first / (including it)

答案 1 :(得分:1)

使用GNU或OSX sed通过-E获得ERE支持:

$ sed -E 's#(.*"url":"([^"]+\/\/)?([^"/]+).*"dm":")-"#\1\3"#' file
{"id":13, "url":"https://sub.domain.com/path", "dm":"sub.domain.com", "ip":"192.168.0.1"}
{"id":14, "url":"sub.domain2.com/?param=value", "dm":"sub.domain2.com", "ip":"192.168.0.1"}
{"id":15, "url":"domain.com/path", "dm":"domain.com", "ip":"192.168.0.1"}

使用GNU awk为第3个arg匹配():

$ awk 'match($0,/(.*"url":"([^"]+\/\/)?([^"/]+).*"dm":")-(".*)/,a){$0=a[1] a[3] a[4]} 1' file
{"id":13, "url":"https://sub.domain.com/path", "dm":"sub.domain.com", "ip":"192.168.0.1"}
{"id":14, "url":"sub.domain2.com/?param=value", "dm":"sub.domain2.com", "ip":"192.168.0.1"}
{"id":15, "url":"domain.com/path", "dm":"domain.com", "ip":"192.168.0.1"}

答案 2 :(得分:0)

您可以使用sed来执行此操作,但如果格式有任何变化,我建议您使用实际理解数据的内容:

sed -i -r 's/^(.*"url":")(.*\/\/)?(.*)(\/.*)"-"/\1\2\3\4"\3"/g' your_file