我正在尝试设计一些关系表来保存各种json流的解析输出。数据流具有相当复杂的结构并且为了便于表设计,我需要知道每个流的每个级别的嵌套键。我迷失了如何使用jq从流中获取每个嵌套密钥。以下是简化的代表性json流。
{
"startAt": 0,
"total": 5315,
"issues": [
{
"id": "44269",
"name": "someName",
"fields": {
"fixVersions": [
{
"id": "11401",
"releaseDate": "2016-09-30"
}
],
"status": {
"id": "10110",
"statusCategory": {
"id": 3,
"name": "Done"
}
}
}
},
{
"id": "44270",
"key": "LEAD-XXXX",
"fields": {
"assignee": {
"id": "10111",
"name": "Don"
},
"status": {
"id": "10110",
"statusCategory": {
"id": 2,
"name": "inProgress"
}
}
}
}
]
}
我期待以下输出。我很乐意有更好的方法来帮助我完成桌面设计。
startAt
total
issues: []
issues:id
issues:name
issues:key
issues:fields
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
如何使用jq获取上述流的嵌套键。非常感谢你的帮助。
答案 0 :(得分:3)
我很乐意有更好的方法......
如果我是你,我会从以下开始(也许可以结束):
[paths(scalars) | map(if type == "number" then 0 else . end)]
| unique
| .[]
在您的示例中,使用-cr命令行选项,这将生成:
["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]
["startAt"]
["total"]
您可以更接近您已表示希望将数字0映射到字符串的内容,但是您必须小心该字符串和键名之间的潜在冲突。举例说明:
[paths(scalars) | map(if type == "number" then "[]" else . end)]
| unique
| .[]
| join(":")
产生
issues:[]:fields:assignee:id
issues:[]:fields:assignee:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:id
issues:[]:key
issues:[]:name
startAt
total
请注意,此方法产生的结果与基于模式推理的方法基本相同。这是件好事。
如上所述使用unique/0
有两个潜在的缺点:(1)输出的排序不反映数据的排序; (2)效率(虽然在实践中不太可能是一个真正的问题,除了可能有大量叶子路径的JSON文本)。
无论如何,可以使用INDEX/2
代替unique
。如果您的jq没有INDEX/2
,则在此处给出其def。
简而言之:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
INDEX(paths(scalars)
| map(if type == "number" then "[]" else . end); .)
| .[]
| join(":")
的产率:
startAt
total
issues:[]:id
issues:[]:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:key
issues:[]:fields:assignee:id
issues:[]:fields:assignee:name
如果您还想要报告清空数组的路径,您可以(例如)简单地将“路径(标量)”更改为“(路径(标量),路径(数组))”。
答案 1 :(得分:1)
如果您想要数据的原理图表示,您可能希望考虑基于模式推断的方法。
例如,使用https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed中定义的schema
函数,您的输入会产生以下推断架构:
{
"startAt": "number",
"total": "number",
"issues": [
{
"fields": {
"assignee": {
"id": "string",
"name": "string"
},
"fixVersions": [
{
"id": "string",
"releaseDate": "string"
}
],
"status": {
"id": "string",
"statusCategory": {
"id": "number",
"name": "string"
}
}
},
"id": "string",
"key": "string",
"name": "string"
}
]
}
如果您通过paths(scalars)
对其进行过滤,则会获得:
["startAt"]
["total"]
["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]
除排序外,这些结果与使用更直接的方法获得的结果相同;我建议验证这两种方法。
答案 2 :(得分:1)
paths
绝对是正确的方法,但获得所需的确切输出有点麻烦。除了精确的排序之外,这是一个过滤器:
def normalize: # convert paths to requested structure
if .[-1]|type=="number" then .[-1]="[]" else . end
| map(select(type!="number"));
def collect: # collect unique normalized paths into an object
reduce (paths|normalize) as $p (
{}
; if getpath($p)==null then setpath($p;null) else . end
);
def colonize($p): # convert object back into : separated paths
keys_unsorted[] as $k
| (if $p=="" then $k else "\($p):\($k)" end) as $n
| $n, (.[$k] | if type=="object" then colonize($n) else empty end);
def summary: # final output without redundant foo: if foo:[] is present
[ collect | colonize("") ]
| map(select(endswith(":[]"))|.[:-3]) as $remove
| map(select($remove[[.]]==[]));
summary[]
示例运行(假设filter.jq
中的过滤器和data.json
中的数据)
$ jq -Mcr -f filter.jq data.json
startAt
total
issues:[]
issues:id
issues:name
issues:fields
issues:fields:fixVersions:[]
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
issues:key
请注意,空数组存在问题。如果数据中有空数组,则此过滤器会将它们报告为普通字段,因为paths
返回的相应路径不会以数字结尾。补偿这一点的最简单方法是首先将空数组映射到非空状态,如[{}]
。例如
def walk(f): # defined here in case your jq doesn't have it
. as $in
| if type == "object" then reduce keys_unsorted[] as $key (
{}; . + { ($key): ($in[$key] | walk(f)) } ) | f
elif type == "array" then map( walk(f) ) | f
else f
end;
walk(if .==[] then [{}] else . end)
| summary[]
答案 3 :(得分:1)
为了清楚起见 - 编写一个以最初设想的格式生成输出的jq过滤器非常容易,尽管这种格式不太常用。
以下方法不需要使用walk/1
来处理空数组的特殊情况。它仅使用unique
,因为INDEX/2
未包含在jq版本1.5(*)中。
使用示例输入和-r
命令行选项,以下内容:
[paths as $p
| if (getpath($p)|type) == "array" then $p + [" []"]
elif ($p[-1]|type) == "number" then empty
else $p
end
| map(select(type != "number"))]
| unique
| .[]
| join(":")
产生
issues: []
issues:fields
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:id
issues:key
issues:name
startAt
total
使用unique
可以轻松避免(*)INDEX/2
,如本页其他部分所述。