使用JQ将大型.json文件处理为换行符

时间:2019-07-15 10:55:06

标签: json powershell google-bigquery jq

我需要将oneline JSON文件转换为Newline Delimited格式,以将数据加载到BigQuery中。文件很大(从2到50 GB),因此代码

cat C: \ file.json | jq -c '. []'> C: \ file_ND.json 

由于内存不足而无法工作。

我也尝试过:

jq -c '. []' C: \ file.json> C: \ file_ND.json 

但是它返回一个空的输出文件。

任何想法如何在jq中使用--stream做到这一点?我找不到在哪里可以提供输入和输出文件以及如何描述格式。

示例文件结构:

[{"action":"U","cd":"2018-06-04T16:54:53.000+02:00","md":"2019-06-01T04:44:22.000+02:00","o":{"_id":3298153,"_type":"acc","parent":{"_id":3298153,"_type":"b","pb":0,"dp":0,"lb":0},"s":"X","sChangeDate":"2018-06-04T16:54:53.000+02:00","owner":8008711577,"aTypeID:1302,"pEx:false,"cAcc":false,"trnSID":3341650,"eo":false}},{"action":"U","cd":"2018-06-04T16:57:47.000+02:00","md":"2019-06-13T14:48:45.000+02:00","o":{"_id":3298372,"_type":"acc","parent":{"_id":3298372,"_type":"ab","pb":0,"dp":0,"lb":0},"s":"X","sChangeDate":"2018-06-04T16:57:47.000+02:00","owner":8008711796,"aTypeID:1302,"pEx:false,"cAcc":false,"trnSID":3342088,"eo":false}},{"action":"U","cd":"2018-07-13T00:53:30.000+02:00","md":"2019-06-11T18:49:03.000+02:00","o":{"_id":3667579,"_type":"acc","parent":{"_id":3667579,"_type":"ab","pb":0,"dp":0,"lb":0},"s":"X","sChangeDate":"2018-07-13T00:53:30.000+02:00","owner":8009080658,"aTypeID:1302,"pEx:false,"cAcc":false,"trnSID":4077943,"eo":false}},{"action":"U","cd":"2018-07-13T12:55:55.000+02:00","md":"2019-06-17T05:42:38.000+02:00","o":{"_id":3672013,"_type":"acc","parent":{"_id":3672013,"_type":"ab","pb":0,"dp":0,"lb":0},"s":"X","sChangeDate":"2018-07-13T12:55:55.000+02:00","owner":8009085060,"aTypeID:1302,"pEx:false,"cAcc":false,"trnSID":4086704,"eo":false}},
... ,
... ,
... ,
]

jq --stream命令应该是什么样?

jq -c --stream '.[] ????????????'

编辑

将jq从1.5升级到1.6后,问题似乎已解决。使用的代码行:

jq -nc --stream 'fromstream(1|truncate_stream(inputs))' C:\input.json > C:\output.json 

0 个答案:

没有答案