使用jq将元素数组分成多个不同长度的数组

时间:2016-09-28 06:06:21

标签: javascript arrays jq

我有一个JSON对象,我想使用jq从一种形式转换为另一种形式(当然,我可以使用javascript或python并迭代,但jq更可取)。问题是输入包含长数组,每当数据在第一个数组中停止重复时,需要将这些数组分成多个较小的数组。我不确定如何描述这个问题,所以我只是在这里举一个例子,希望更多的解释。一个安全的假设 - 如果它有任何帮助 - 是输入数据总是在前两个元素上预先排序(例如" row_x"和" col_y"):

输入:

{
  "headers": [ "col1", "col2", "col3" ],
  "data": [
    [ "row1","col1","b","src2" ],
    [ "row1","col1","b","src1" ],
    [ "row1","col1","b","src3" ],
    [ "row1","col2","d","src4" ],
    [ "row1","col2","e","src5" ],
    [ "row1","col2","f","src6" ],
    [ "row1","col3","j","src7" ],
    [ "row1","col3","g","src8" ],
    [ "row1","col3","h","src9" ],
    [ "row1","col3","i","src10" ],
    [ "row2","col1","l","src13" ],
    [ "row2","col1","j","src11" ],
    [ "row2","col1","k","src12" ],
    [ "row2","col3","o","src15" ]
  ]
}

期望的输出:

{
  "headers": [ "col1", "col2", "col3" ],
  "values": [
    [["b","b","b"],["d","e","f"],["g","h","i","j"]],
    [["j","k","l"],null,["o"]]
  ],
  "sources": [
    [["src1","src2","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
    [["src11","src12","src13"],null,["src15"]]
  ]
}

这在jq中是否可行?

更新:此变体的一个变体是保留原始数据顺序,因此输出如下:

{
  "headers": [ "col1", "col2", "col3" ],
  "values": [
    [["b","b","b"],["d","e","f"],["j","g","h","i"]],
    [["l","j","k"],null,["o"]]
  ],
  "sources": [
    [["src2","src1","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
    [["src13","src11","src12"],null,["src15"]]
  ]
}

3 个答案:

答案 0 :(得分:1)

可行吗?当然!

首先,您需要按行和列对数据进行分组。然后使用组,构建您的值/源数组。

.headers as $headers | .data
    # make the data easier to access
    | map({ row: .[0], col: .[1], val: .[2], src: .[3] })
    # keep it sorted so they are in expected order in the end
    | sort_by([.row,.col,.src])
    # group by rows
    | group_by(.row)
    # create a map to each of the cols for easier access
    | map(group_by(.col)
        | reduce .[] as $col ({};
            .[$col[0].col] = [$col[] | {val,src}]
        )
    )
    # build the result
    | {
        headers: $headers,
        values: map([.[$headers[]] | [.[]?.val]]),
        sources: map([.[$headers[]] | [.[]?.src]])
    }

这将产生以下结果:

{
  "headers": [ "col1", "col2", "col3" ],
  "values": [
    [
      [ "b", "b", "b" ],
      [ "d", "e", "f" ],
      [ "i", "j", "g", "h" ]
    ],
    [
      [ "j", "k", "l" ],
      [],
      [ "o" ]
    ]
  ],
  "sources": [
    [
      [ "src1", "src2", "src3" ],
      [ "src4", "src5", "src6" ],
      [ "src10", "src7", "src8", "src9" ]
    ],
    [
      [ "src11", "src12", "src13" ],
      [],
      [ "src15" ]
    ]
  ]
}

答案 1 :(得分:0)

由于这里的主要数据源可以被认为是 二维矩阵,可能值得考虑一个 以矩阵为导向的方法解决问题,特别是如果是的话 意图是输入矩阵中的空行不是简单地省略,或者如果 矩阵中的列数最初未知。

为了增加一些东西,让我们选择代表m x n 矩阵M,作为[m,n,a]形式的JSON数组,其中a是数组 对于数组,使得a [i] [j]是行i,列j中的M的元素。

首先,让我们定义一些基于矩阵的基本操作:

def ij(i;j): .[2][i][j];

def set_ij(i;j;value):
  def max(a;b): if a < b then b else a end;
  .[0] as $m | .[1] as $n
  | [max(i+1;$m), max(j+1;$n), (.[2] | setpath([i,j];value)) ];

数据源对第i行使用“rowI”形式的字符串 行j的“colJ”,所以我们相应地定义了一个矩阵更新函数:

def update_row_col( row; col; value):
  ((row|sub("^row";"")|tonumber) - 1) as $r
   | ((col|sub("^col";"")|tonumber) - 1) as $c
   | ij($r;$c) as $v
   | set_ij($r; $c; if $v == null then [value] else $v + [value] end) ;

给定一个形式为[“rowI”,“colJ”,V,S]的项目数组, 在第I行生成值{“source”:S,“value”:V}的矩阵 和专栏J:

def generate:
  reduce .[] as $x ([0,0,null];
    update_row_col( $x[0];  $x[1]; { "source": $x[3], "value": $x[2] }) );

现在我们转向所需的输出。以下过滤器将从输入矩阵中提取f,生成一个数组数组,将[]替换为null:

def extract(f):
  . as $m
  | (reduce range(0; $m[0]) as $i
      ([];
       . + ( reduce range(0; $m[1]) as $j
            ([];
         . + [ $m | ij($i;$j) // [] | map(f) ]) ) ))
  | map( if length == 0 then null else . end );

全部放在一起(动态生成标题留给感兴趣的读者练习):

{headers} +
  (.data | generate
   | { "values": extract(.value), "sources": extract(.source) } )

输出:

{ "headers": [ "col1", "col2", "col3" ], "values": [ [ "b", "b", "b" ], [ "d", "e", "f" ], [ "j", "g", "h", "i" ], [ "l", "j", "k" ], null, [ "o" ] ], "sources": [ [ "src2", "src1", "src3" ], [ "src4", "src5", "src6" ], [ "src7", "src8", "src9", "src10" ], [ "src13", "src11", "src12" ], null, [ "src15" ] ] }

答案 2 :(得分:0)

以下是使用 reduce getpath setpath

的解决方案
  .headers as $headers
| reduce .data[] as [$r,$c,$v,$s] (
    {headers:$headers, values:{}, sources:{}}
  ; setpath(["values",  $r, $c]; (getpath(["values", $r, $c])  // []) + [$v])
  | setpath(["sources", $r, $c]; (getpath(["sources", $r, $c]) // []) + [$s])
  )
| .values  = [ .values[]  | [ .[ $headers[] ] ] ]
| .sources = [ .sources[] | [ .[ $headers[] ] ] ]

示例输出(为便于阅读而手动重新格式化)

{
 "headers":["col1","col2","col3"],
 "values":[[["b","b","b"],["d","e","f"],["j","g","h","i"]],
           [["l","j","k"],null,["o"]]],
 "sources":[[["src2","src1","src3"],["src4","src5","src6"],["src7","src8","src9","src10"]],
            [["src13","src11","src12"],null,["src15"]]]
}