我有一个JSON对象,我想使用jq从一种形式转换为另一种形式(当然,我可以使用javascript或python并迭代,但jq更可取)。问题是输入包含长数组,每当数据在第一个数组中停止重复时,需要将这些数组分成多个较小的数组。我不确定如何描述这个问题,所以我只是在这里举一个例子,希望更多的解释。一个安全的假设 - 如果它有任何帮助 - 是输入数据总是在前两个元素上预先排序(例如" row_x"和" col_y"):
输入:
{
"headers": [ "col1", "col2", "col3" ],
"data": [
[ "row1","col1","b","src2" ],
[ "row1","col1","b","src1" ],
[ "row1","col1","b","src3" ],
[ "row1","col2","d","src4" ],
[ "row1","col2","e","src5" ],
[ "row1","col2","f","src6" ],
[ "row1","col3","j","src7" ],
[ "row1","col3","g","src8" ],
[ "row1","col3","h","src9" ],
[ "row1","col3","i","src10" ],
[ "row2","col1","l","src13" ],
[ "row2","col1","j","src11" ],
[ "row2","col1","k","src12" ],
[ "row2","col3","o","src15" ]
]
}
期望的输出:
{
"headers": [ "col1", "col2", "col3" ],
"values": [
[["b","b","b"],["d","e","f"],["g","h","i","j"]],
[["j","k","l"],null,["o"]]
],
"sources": [
[["src1","src2","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
[["src11","src12","src13"],null,["src15"]]
]
}
这在jq中是否可行?
更新:此变体的一个变体是保留原始数据顺序,因此输出如下:
{
"headers": [ "col1", "col2", "col3" ],
"values": [
[["b","b","b"],["d","e","f"],["j","g","h","i"]],
[["l","j","k"],null,["o"]]
],
"sources": [
[["src2","src1","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
[["src13","src11","src12"],null,["src15"]]
]
}
答案 0 :(得分:1)
可行吗?当然!
首先,您需要按行和列对数据进行分组。然后使用组,构建您的值/源数组。
.headers as $headers | .data
# make the data easier to access
| map({ row: .[0], col: .[1], val: .[2], src: .[3] })
# keep it sorted so they are in expected order in the end
| sort_by([.row,.col,.src])
# group by rows
| group_by(.row)
# create a map to each of the cols for easier access
| map(group_by(.col)
| reduce .[] as $col ({};
.[$col[0].col] = [$col[] | {val,src}]
)
)
# build the result
| {
headers: $headers,
values: map([.[$headers[]] | [.[]?.val]]),
sources: map([.[$headers[]] | [.[]?.src]])
}
这将产生以下结果:
{
"headers": [ "col1", "col2", "col3" ],
"values": [
[
[ "b", "b", "b" ],
[ "d", "e", "f" ],
[ "i", "j", "g", "h" ]
],
[
[ "j", "k", "l" ],
[],
[ "o" ]
]
],
"sources": [
[
[ "src1", "src2", "src3" ],
[ "src4", "src5", "src6" ],
[ "src10", "src7", "src8", "src9" ]
],
[
[ "src11", "src12", "src13" ],
[],
[ "src15" ]
]
]
}
答案 1 :(得分:0)
由于这里的主要数据源可以被认为是 二维矩阵,可能值得考虑一个 以矩阵为导向的方法解决问题,特别是如果是的话 意图是输入矩阵中的空行不是简单地省略,或者如果 矩阵中的列数最初未知。
为了增加一些东西,让我们选择代表m x n 矩阵M,作为[m,n,a]形式的JSON数组,其中a是数组 对于数组,使得a [i] [j]是行i,列j中的M的元素。
首先,让我们定义一些基于矩阵的基本操作:
def ij(i;j): .[2][i][j];
def set_ij(i;j;value):
def max(a;b): if a < b then b else a end;
.[0] as $m | .[1] as $n
| [max(i+1;$m), max(j+1;$n), (.[2] | setpath([i,j];value)) ];
数据源对第i行使用“rowI”形式的字符串 行j的“colJ”,所以我们相应地定义了一个矩阵更新函数:
def update_row_col( row; col; value):
((row|sub("^row";"")|tonumber) - 1) as $r
| ((col|sub("^col";"")|tonumber) - 1) as $c
| ij($r;$c) as $v
| set_ij($r; $c; if $v == null then [value] else $v + [value] end) ;
给定一个形式为[“rowI”,“colJ”,V,S]的项目数组, 在第I行生成值{“source”:S,“value”:V}的矩阵 和专栏J:
def generate:
reduce .[] as $x ([0,0,null];
update_row_col( $x[0]; $x[1]; { "source": $x[3], "value": $x[2] }) );
现在我们转向所需的输出。以下过滤器将从输入矩阵中提取f,生成一个数组数组,将[]替换为null:
def extract(f):
. as $m
| (reduce range(0; $m[0]) as $i
([];
. + ( reduce range(0; $m[1]) as $j
([];
. + [ $m | ij($i;$j) // [] | map(f) ]) ) ))
| map( if length == 0 then null else . end );
全部放在一起(动态生成标题留给感兴趣的读者练习):
{headers} +
(.data | generate
| { "values": extract(.value), "sources": extract(.source) } )
输出:
{
"headers": [
"col1",
"col2",
"col3"
],
"values": [
[
"b",
"b",
"b"
],
[
"d",
"e",
"f"
],
[
"j",
"g",
"h",
"i"
],
[
"l",
"j",
"k"
],
null,
[
"o"
]
],
"sources": [
[
"src2",
"src1",
"src3"
],
[
"src4",
"src5",
"src6"
],
[
"src7",
"src8",
"src9",
"src10"
],
[
"src13",
"src11",
"src12"
],
null,
[
"src15"
]
]
}
答案 2 :(得分:0)
以下是使用 reduce , getpath 和 setpath
的解决方案 .headers as $headers
| reduce .data[] as [$r,$c,$v,$s] (
{headers:$headers, values:{}, sources:{}}
; setpath(["values", $r, $c]; (getpath(["values", $r, $c]) // []) + [$v])
| setpath(["sources", $r, $c]; (getpath(["sources", $r, $c]) // []) + [$s])
)
| .values = [ .values[] | [ .[ $headers[] ] ] ]
| .sources = [ .sources[] | [ .[ $headers[] ] ] ]
示例输出(为便于阅读而手动重新格式化)
{
"headers":["col1","col2","col3"],
"values":[[["b","b","b"],["d","e","f"],["j","g","h","i"]],
[["l","j","k"],null,["o"]]],
"sources":[[["src2","src1","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
[["src13","src11","src12"],null,["src15"]]]
}