Question

[{"foo": 1},
 {"foo": 2},
 {"foo": 3},
 {"foo": 4},
 {"foo": 5},
 {"foo": 6},
 {"foo": 7},
 {"foo": 8},
 {"foo": 9},
 {"foo": 10},
 {"foo": 11},
 {"foo": 12},
 {"foo": 13},
 {"foo": 14},
 {"foo": 15}
]

我想使用jq将这个数组分成更小的数组文件。

到目前为止，我已尝试过这个

 cat foo.json | jq -c -M -s '.[]' | split -l 5 - charded/

这导致3个单独的文件，但不会将字典包装成数组。

Answer 1

jq IO相当原始，所以我建议从：

开始

def chunks(n):
  def c: .[0:n], (if length > n then .[n:]|c else empty end);
  c;

chunks(5)

现在的关键是使用-c命令行选项：

jq -c -f chunk.jq foo.json

使用您的数据，这将生成三个数组的流，每行一个。

您可以将其传输到split或awk或其他任何内容，以将每行发送到单独的文件，例如

awk '{n++; print > "out" n ".json"}'

如果你想在每个文件中打印出漂亮的数组，你可以在每个文件上使用jq，也许使用sponge，如下所示：

for f in out*.json ; do jq . $f | sponge $f ; done

def-free solution

如果您不想定义功能，或者更喜欢单行对于管道的jq组件，请考虑以下事项：

jq -c --argjson n 5 'recurse(.[$n:]; length > 0) | .[0:$n]' foo.json

注释

chunks也适用于字符串。
chunks定义了0-arity函数c，以利用jq对尾部调用优化的支持。

Answer 2

如果data.json非常大（例如，太大而无法轻松放入RAM），并且如果你有一个包含所谓的流解析器的jq版本，那么你可以先使用jq来分割数据。 json进入其顶级组件元素，然后重新组合它们，最后使用awk或split或本页其他地方描述的任何内容。

调用

首先是您使用的管道：

jq -cn --stream 'fromstream(1|truncate_stream(inputs))' data.json |
  jq -cn -f groups.jq

groups.jq

# Use nan as EOS
def groups(stream; n):
  foreach (stream,nan) as $x ([];
    if length < n then  . + [$x] else [$x] end;
    if (.[-1]|isnan) and length > 1 then .[:-1]
    elif length == n then .
    else empty end) ;

groups(inputs; 5)

如何使用jq将JSON文件分解为包含在数组中的较小json？

2 个答案:

def-free solution

注释

调用

groups.jq