我有一个包含超过一百万个JSON实体的.txt文件,其中包含从python程序生成的不同密钥。这只是一个例子。
{
"category": "Athlete",
"website": "example.com",
"talking_about_count": 560,
"description": "xxx",
"id": "123"
}
{
"category": "Community",
"talking_about_count": 0,
"name": "The Second Civil War",
"likes": 26,
"id": "234",
"is_published": true
}
即使每个JSON具有不同的属性,它们也有共同的属性。 生成的.csv文件将包含列类别,网站,talking_about_count,description,id,name,likes,is_published like this
"category","website","talking_about_count","name","likes","description","id","is_published"
"Athlete","example.com","560","","","xxx","123",""
"Community","","0","The Second Civil War","26","","234","True"
https://json-csv.com/做得很漂亮,但无法处理超过1000个实体的数据集。
我想从包含一百万个JSON实体的.txt文件创建一个CSV,我想知道是否有更好的方法来解决这个问题。
答案 0 :(得分:1)
以下是使用jq
的解决方案如果文件filter.jq
包含
(reduce (.[]|keys_unsorted[]) as $k ({};.[$k]="")) as $o # object with all keys
| ($o | keys_unsorted), (.[] | $o * . | [.[]]) # generate header and data
| @csv # convert to csv
和data.json
包含示例数据,然后是命令
jq -M -s -r -f filter.jq data.json
将产生输出
"category","website","talking_about_count","description","id","name","likes","is_published"
"Athlete","example.com",560,"xxx","123","","",""
"Community","",0,"","234","The Second Civil War",26,true