我在Mac上运行并且拥有一个非常大的.json文件,其中包含超过10万个对象。
我想将文件拆分成多个文件(最好是50-100)。
消息来源
原始.json文件是一个多维数组,看起来有点像这样:
[{
"id": 1,
"item_a": "this1",
"item_b": "that1"
}, {
"id": 2,
"item_a": "this2",
"item_b": "that2"
}, {
"id": 3,
"item_a": "this3",
"item_b": "that3"
}, {
"id": 4,
"item_a": "this4",
"item_b": "that4"
}, {
"id": 5,
"item_a": "this5",
"item_b": "that5"
}]
渴望输出
如果这被分成三个文件,我希望输出看起来像这样:
文件1:
[{
"id": 1,
"item_a": "this1",
"item_b": "that1"
}, {
"id": 2,
"item_a": "this2",
"item_b": "that2"
}]
文件2:
[{
"id": 3,
"item_a": "this3",
"item_b": "that3"
}, {
"id": 4,
"item_a": "this4",
"item_b": "that4"
}]
文件3:
[{
"id": 5,
"item_a": "this5",
"item_b": "that5"
}]
任何想法都将不胜感激。谢谢!
答案 0 :(得分:3)
Perl救援:
#!/usr/bin/perl
use warnings;
use strict;
use JSON;
my $file_count = 5; # You probably want 50 - 100 here.
my $json_text = do {
local $/;
open my $IN, '<', '1.json' or die $!;
<$IN>
};
my $arr = decode_json($json_text);
my $size = @$arr / $file_count;
my $rest = @$arr % $file_count;
my $i = 1;
while (@$arr) {
open my $OUT, '>', "file$i.json" or die $!;
my @chunk = splice @$arr, 0, $size;
++$size if $i++ >= $file_count - $rest;
print {$OUT} encode_json(\@chunk);
close $OUT or die $!;
}
答案 1 :(得分:3)
@ choroba的回答非常有效和灵活。
我有一个jq
的bash解决方案。
#!/bin/bash
i=0
file=0
for f in `cat data.json | jq -c -M '.[]'`;
do
if [ $i -eq 2 ]; then
ret=`jq --slurp "." /tmp/0.json /tmp/1.json > File$file.json`;
ret=`rm /tmp/0.json /tmp/1.json`; #cleanup
((file = file + 1));
i=0
fi
ret=`echo $f > /tmp/$i.json`;
((i = i + 1));
done
if [ -f /tmp/0.json ]; then
ret=`jq --slurp '.' /tmp/0.json > File$file.json`;
ret=`rm /tmp/0.json`; #cleanup
fi
答案 2 :(得分:1)
$ cat tst.awk
/{/ && (++numOpens % 2) {
if (++numOuts > 1) {
print out, "}]"
close(out)
}
out = "out" numOuts
$0 = "[{"
}
{
# print > out
print out, $0
}
$ awk -f tst.awk file
out1 [{
out1 "id": 1,
out1 "item_a": "this1",
out1 "item_b": "that1"
out1 }, {
out1 "id": 2,
out1 "item_a": "this2",
out1 "item_b": "that2"
out1 }]
out2 [{
out2 "id": 3,
out2 "item_a": "this3",
out2 "item_b": "that3"
out2 }, {
out2 "id": 4,
out2 "item_a": "this4",
out2 "item_b": "that4"
out2 }]
out3 [{
out3 "id": 5,
out3 "item_a": "this5",
out3 "item_b": "that5"
out3 }]
只需删除print out, $0
并在测试后取消注释# print > out
,并对此感到满意。