将所有结果放在同一个数组上

时间:2018-06-04 10:26:26

标签: json group-by jq

我几个小时都在苦苦挣扎,我很确定有些东西我不知道。

鉴于此JSON:

[
{
  "LAST_JOB_POD":"gitlab-web-65-gwwwh",
  "STARTED_AT":"31-05-2018-18:18:48",
  "FINISHED":"false",
  "FIRST_INDEXED":"0",
  "LAST_INDEXED":"3143",
  "failed_projects":{
    "1082": "4:Deadline Exceeded, trace",
    "1273": "/opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/elasticsearch-transport-5.0.3/lib/elasticsearch/transport/transport/base.rb:201:in `__raise_transport_error'",
    "2492": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "3060": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)"
  }
},
{
  "LAST_JOB_POD":"gitlab-web-65-gwwwh",
  "STARTED_AT":"31-05-2018-18:18:48",
  "FINISHED":"false",
  "FIRST_INDEXED":"0",
  "LAST_INDEXED":"3143",
  "failed_projects":{
    "5570": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6103": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6188": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6695": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6721": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6728": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6747": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)"
  }
},
{ 
  "LAST_JOB_POD":"gitlab-web-65-gwwwh",
  "STARTED_AT":"31-05-2018-18:18:48",
  "FINISHED":"false",
  "FIRST_INDEXED":"0",
  "LAST_INDEXED":"3143",
  "failed_projects":{
    "6760": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6939": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6941": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6942": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6947": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "7201": "/opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/elasticsearch-transport-5.0.3/lib/elasticsearch/transport/transport/base.rb:201:in `__raise_transport_error'",
    "7707": ", trace - [\"/opt/gitlab/embedded/service/gitlab-rails/ee/lib/gitlab/elastic/indexer.rb:64:in `run_indexer!'\"",
    "7787": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)"
  }
}
]

我目前正在使用jq来提取failed_projects条目,但使用

[] | select(.failed_projects != null) | . as $object | {"failed_projects"}[]

我将结果分成不同的组:

{
"1082": "...",
...
}
{
"5570": "...",
...
}
{
"6760": "...",
...
}

我想要完成的是使用相同的异常对ID进行分组。例如:

[{
"Exception": "ReadTimeout",
 [{
   "ID": 2492,
   "ID": 3060
 }]
},
{
"Exception": "Deadline Exceeded",
 [{
   "ID": 1082
 }]
}]

1 个答案:

答案 0 :(得分:1)

The illustrative output is invalid as JSON and has objects with repeated keys, which is probably not what you actually want, but the following jq program will produce output that is in accordance with the general problem description. Since you do not seem to have specified the precise grouping criterion, I have taken the error message text after the last ":" as the grouping criterion. (If, for example, you wanted to consider the text after the first ":", use "^[^:]*: *" as the regex.)

The first step gathers the .failed_projects together and applies to_entries so that we can readily access the ID and error message text:

[.[] | .failed_projects | to_entries[]]

Next we extract the grouping criterion, and use it to form the groups:

| map(.value |= sub("^.*: *";""))
| group_by(.value)

Finally, we transform the groups into JSON objects of the form: {GROUP: ARRAY_OF_IDs}:

| map( .[0].value as $key
       | [.[] | .key] as $value
       | {($key): $value} )

Putting the above fragments together in a file, program.jq, and using the invocation:

jq -f program.jq input.json

yields the output shown below. You will evidently want to modify the grouping criterion. You might also wish to convert the ID strings to JSON numbers, which can be done by tonumber or more cautiously by (tonumber? // .).

To understand program.jq, you might like to start with the first fragment, and then add each of the others in turn.

Output

[
  {
    "Deadline Exceeded, trace": [
      "1082"
    ]
  },
  {
    "TimeoutError)": [
      "6728",
      "6747",
      "6939",
      "5570",
      "6103",
      "6188",
      "6695",
      "6721",
      "2492",
      "6760",
      "3060",
      "6941",
      "6942",
      "6947",
      "7787"
    ]
  },
  {
    "in `__raise_transport_error'": [
      "1273",
      "7201"
    ]
  },
  {
    "in `run_indexer!'\"": [
      "7707"
    ]
  }
]