Question

我需要在一系列相同结构的嵌套JSON文件中合并一个数组，这些文件共享相同的更高级别的密钥。

目标是创建合并文件，同时保留所有现有的更高级别的键和值。

文件1：

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "groups": [
            {
              "GroupId": "123456",
              "GroupName": "foo"
            },
            {
              "GroupId": "234567",
              "GroupName": "bar"
            }
          ]
        }
      ]
    }
  ]
}

文件2：

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "group_policies": [
            {
              "GroupName": "foo",
              "PolicyNames": [
                "all_foo",
                "all_bar"                
              ]
            },
            {
              "GroupName": "bar",
              "PolicyNames": [
                "all_bar"
              ]
            }
          ]
        }
      ]
    }
  ]
}

预期结果：

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "groups": [
            {
              "GroupId": "123456",
              "GroupName": "foo"
            },
            {
              "GroupId": "234567",
              "GroupName": "bar"
            }
          ]
        },
        {
          "group_policies": [
           {
              "GroupName": "foo",
              "PolicyNames": [
                "all_foo",
                "all_bar"                
              ]
            },
            {
              "GroupName": "bar",
              "PolicyNames": [
                "all_bar"
              ]
            }
           ]
        }
      ]
    }
  ]
}

我根据对此类其他问题的回答尝试了以下内容但没有成功：

jq -s '.[0] * .[1]' test1.json test2.json

jq -s add test1.json test2.json

jq -n '[inputs[]]' test{1,2}.json

以下成功合并数组但在结果中缺少更高级别的键和值。

jq -s '.[0].regions[0].services[0] * .[1].regions[0].services[0]' test1.json test2.json

我假设有一个简单的jq解决方案可以逃避我的搜索。如果没有，jq和bash的任何组合都可以用于解决方案。

Answer 1

这是一个解决方案，它将数组转换为对象，直到服务级别，与*合并并转换回数组形式。如果file1和file2包含示例数据，则此命令为：

$ jq -Mn --argfile file1 file1 --argfile file2 file2 '
   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[] as $s              # save each element of .services in $s
     | {($a): {($r): $s}}             # generate object for each account,region,service
   # | debug                          # uncomment debug here to see stream                                   
   ;
     reduce merge as $x ({}; . * $x)  # use '*' to recombine all the objects from merge

   # | debug                          # uncomment debug here to see combined object

   | keys[] as $a                     # for each key (account) of combined object
   | {account:$a, regions:[           #  construct object with {account, regions array}
        .[$a]                         #   for each account
      | keys[] as $r                  #    for each key (region) of account object
      | {region:$r, services:[        #     constuct object with {region, services array}
           .[$r]                      #      for each region
         | keys[] as $s               #       for each service
         | {($s): .[$s]}              #         generate service object
        ]}                            #      add service objects to service array
      ]}'                             #   add region object ot regions array

产生

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "group_policies": [
            {
              "GroupName": "foo",
              "PolicyNames": [
                "all_foo",
                "all_bar"
              ]
            },
            {
              "GroupName": "bar",
              "PolicyNames": [
                "all_bar"
              ]
            }
          ]
        },
        {
          "groups": [
            {
              "GroupId": "123456",
              "GroupName": "foo"
            },
            {
              "GroupId": "234567",
              "GroupName": "bar"
            }
          ]
        }
      ]
    }
  ]
}

扩展解释

逐步组装此步骤可以更好地了解其工作原理。从这个过滤器开始

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | $a
   ;
   merge

因为有两个对象（一个来自file1，一个来自file2），这个输出每个.account：

"123456789012"
"123456789012"

请注意，.account as $a不会更改.的当前值。变量允许我们“钻取”到子对象而不会损失更高级别上下文。考虑一下这个过滤器：

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | [$a, $r]
   ;
   merge

输出（帐户，地区）对：

["123456789012","one"]
["123456789012","one"]

现在我们可以继续深入研究服务：

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[]
     | [$a, $r, .]
   ;
   merge

此时数组的第三个元素（.）指的是每个元素 .services数组中的连续服务，因此此过滤器生成

["123456789012","one",{"groups":[{"GroupId":"123456","GroupName":"foo"},
                                 {"GroupId":"234567","GroupName":"bar"}]}]
["123456789012","one",{"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
                                         {"GroupName":"bar","PolicyNames":["all_bar"]}]}]

这个（完整的）合并功能：

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[] as $s              # save each element of .services in $s
     | {($a): {($r): $s}}             # generate object for each account,region,service
   ;
   merge

生成流

{"123456789012":{"one":{"groups":[{"GroupId":"123456","GroupName":"foo"},
                                  {"GroupId":"234567","GroupName":"bar"}]}}}
{"123456789012":{"one":{"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
                                          {"GroupName":"bar","PolicyNames":["all_bar"]}]}}}

要注意的重要一点是，这些是可以轻松与*合并的对象通过减少步骤：

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[] as $s              # save each element of .services in $s
     | {($a): {($r): $s}}             # generate object for each account,region,service
   ;
   reduce merge as $x ({}; . * $x)    # use '*' to recombine all the objects from merge

reduce将其本地状态（.）初始化为{}然后计算合并函数的每个结果的新状态通过评估. * $x，递归地组合对象合并从$ file1和$ file：

构建

{"123456789012":{"one":{"groups":[{"GroupId":"123456","GroupName":"foo"},
                                  {"GroupId":"234567","GroupName":"bar"}],
                        "group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
                                          {"GroupName":"bar","PolicyNames":["all_bar"]}]}}}

请注意*停止合并'groups'和'group_policies'键中的数组对象。如果我们想继续合并，我们可以在合并函数中创建更多对象。例如考虑这个扩展名：

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[] as $s              # save each element of .services in $s
     | (
         $s.groups[]? as $g
       | {($a): {($r): {groups: {($g.GroupId): $g}}}}
       ), (
         $s.group_policies[]? as $p
       | {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
       )
   ;
   merge

此合并比前一个更深，产生

{"123456789012":{"one":{"groups":{"123456":{"GroupId":"123456","GroupName":"foo"}}}}}
{"123456789012":{"one":{"groups":{"234567":{"GroupId":"234567","GroupName":"bar"}}}}}
{"123456789012":{"one":{"group_policies":{"foo":{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]}}}}}
{"123456789012":{"one":{"group_policies":{"bar":{"GroupName":"bar","PolicyNames":["all_bar"]}}}}}

这里重要的是“groups”和“group_policies”键包含对象这意味着在此过滤器中

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[] as $s              # save each element of .services in $s
     | (
         $s.groups[]? as $g
       | {($a): {($r): {groups: {($g.GroupId): $g}}}}
       ), (
         $s.group_policies[]? as $p
       | {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
       )
   ;
   reduce merge as $x ({}; . * $x)

reduce *将合并组和组策略，而不是覆盖它们，生成：

{"123456789012":{"one":{"groups":{"123456":{"GroupId":"123456","GroupName":"foo"},
                                  "234567":{"GroupId":"234567","GroupName":"bar"}},
                        "group_policies":{"foo":{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
                                          "bar":{"GroupName":"bar","PolicyNames":["all_bar"]}}}}}

将其重新放回原始形式需要更多工作，但不多：

   def merge:                         # merge function
       ($file1, $file2)               # process $file1 then $file2
     | .account as $a                 # save .account in $a
     | .regions[]                     # for each element of .regions
     | .region as $r                  # save .region in $r
     | .services[] as $s              # save each element of .services in $s
     | (
         $s.groups[]? as $g
       | {($a): {($r): {groups: {($g.GroupId): $g}}}}
       ), (
         $s.group_policies[]? as $p
       | {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
       )
   ;
   reduce merge as $x ({}; . * $x)

   | keys[] as $a                     # for each key (account) of combined object
   | {account:$a, regions:[           #  construct object with {account, regions array}
        .[$a]                         #   for each account
      | keys[] as $r                  #    for each key (region) of account object
      | {region:$r, services:[        #     constuct object with {region, services array}
           .[$r]                      #      for each region
         |   {groups:         [.groups[]]}          # add groups to service
           , {group_policies: [.group_policies[]]}  # add group_policies to service
        ]}
      ]}

现在使用此版本假设我们的file2包含一个组以及group_policies。 e.g

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "groups": [
            {
              "GroupId": "999",
              "GroupName": "baz"
            }
          ]
        },
        {
         "group_policies": [
            {
              "GroupName": "foo",
              "PolicyNames": [
                "all_foo",
                "all_bar"                
              ]
            },
            {
              "GroupName": "bar",
              "PolicyNames": [
                "all_bar"
              ]
            }
          ]
        }
      ]
    }
  ]
}

此解决方案的第一个版本产生

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "group_policies": [
            {
              "GroupName": "foo",
              "PolicyNames": [
                "all_foo",
                "all_bar"
              ]
            },
            {
              "GroupName": "bar",
              "PolicyNames": [
                "all_bar"
              ]
            }
          ]
        },
        {
          "groups": [
            {
              "GroupId": "999",
              "GroupName": "baz"
            }
          ]
        }
      ]
    }
  ]
}

此修订版产生

{
  "account": "123456789012",
  "regions": [
    {
      "region": "one",
      "services": [
        {
          "groups": [
            {
              "GroupId": "123456",
              "GroupName": "foo"
            },
            {
              "GroupId": "234567",
              "GroupName": "bar"
            },
            {
              "GroupId": "999",
              "GroupName": "baz"
            }
          ]
        },
        {
          "group_policies": [
            {
              "GroupName": "foo",
              "PolicyNames": [
                "all_foo",
                "all_bar"
              ]
            },
            {
              "GroupName": "bar",
              "PolicyNames": [
                "all_bar"
              ]
            }
          ]
        }
      ]
    }
  ]
}

Answer 2

结合jq add和jq给我们：

jq '.hits.hits' logs.*.json | jq -s add

将所有日志中的所有hits.hits数组合并。* .json文件合并为一个大数组。

如何使用jq合并相同结构的嵌套json文件

2 个答案:

扩展解释