使用jq在json中使用不同值的独特组合

时间:2017-02-27 06:25:23

标签: arrays json bash unique jq

我有一个json文件(input.json),如下所示:

{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"} (repeat of line 5)

我想过滤掉每个值jq的唯一组合。 结果应如下所示:

{"header1":"a","header2":1a, "header3":1a, "header4":"apple"},
{"header1":"b","header2":2a, "header3":2a, "header4":"orange"}
{"header1":"c","header2":1a, "header3":2a, "header4":"banana"},
{"header1":"d","header2":2a, "header3":1a, "header4":"apple"},
{"header1":"a","header2":2a, "header3":1a, "header4":"banana"},
{"header1":"b","header2":1a, "header3":2a, "header4":"orange"},
{"header1":"b","header2":1a, "header3":1a, "header4":"orange"},
{"header1":"d","header2":1a, "header3":1a, "header4":"apple"}

我尝试使用header1与其他标题进行分组,但它没有生成唯一结果。 我使用了unique,但没有产生正确的结果。

我怎么能得到这个?我是jq的新手,没有找到很多关于它的教程。

由于

2 个答案:

答案 0 :(得分:0)

  1. 您提供的示例行不是有效的JSON。由于您的序言将它们作为JSON引入,因此以下假设您打算提供一组JSON对象。

  2. 这个问题在几个方面都不清楚,但从示例中看,unique可能就是您正在寻找的内容,因此请考虑:

  3. 调用:jq -c' unique []' input.json

    输出:

    {"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
    {"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
    {"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
    {"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
    {"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
    {"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
    {"header1":"d","header2":"1a","header3":"1a","header4":"apple"}
    {"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
    
    1. 如果你需要其他格式的输出,你也可以使用jq这样做,但要求不是那么清楚,所以让我们把它留作练习: - )

答案 1 :(得分:0)

由于peak表示您的输入不是合法的JSON我已经冒昧地纠正它并转换为单个对象列表:

{"header1":"a","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"b","header2":"2a", "header3":"2a", "header4":"orange"}
{"header1":"c","header2":"1a", "header3":"2a", "header4":"banana"}
{"header1":"d","header2":"2a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}
{"header1":"b","header2":"1a", "header3":"2a", "header4":"orange"}
{"header1":"b","header2":"1a", "header3":"1a", "header4":"orange"}
{"header1":"d","header2":"1a", "header3":"1a", "header4":"apple"}
{"header1":"a","header2":"2a", "header3":"1a", "header4":"banana"}

如果此数据位于data.json并且您正在运行

jq -M -s -f filter.jq data.json

使用以下filter.jq

foreach .[] as $r (
  {}
; ($r | map(.)) as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)

它将按原始顺序生成以下输出,没有重复。

{"header1":"a","header2":"1a","header3":"1a","header4":"apple"}
{"header1":"b","header2":"2a","header3":"2a","header4":"orange"}
{"header1":"c","header2":"1a","header3":"2a","header4":"banana"}
{"header1":"d","header2":"2a","header3":"1a","header4":"apple"}
{"header1":"a","header2":"2a","header3":"1a","header4":"banana"}
{"header1":"b","header2":"1a","header3":"2a","header4":"orange"}
{"header1":"b","header2":"1a","header3":"1a","header4":"orange"}
{"header1":"d","header2":"1a","header3":"1a","header4":"apple"}

请注意

($r | map(.))

用于生成仅包含每行值的数组 假设总是产生唯一的密钥路径。这是真的 样本数据但对于更复杂的值可能不正确。

较慢但更强大的filter.jq

foreach .[] as $r (
  {}
; [$r | tojson] as $p | if getpath($p) then empty else setpath($p;1) end
; $r
)

使用整行的json表示作为唯一键来确定先前是否有行。