我有一个大小约为8GB的JSON文件。当我尝试使用此脚本转换文件时:
import csv
import json
infile = open("filename.json","r")
outfile = open("data.csv","w")
writer = csv.writer(outfile)
for row in json.loads(infile.read()):
writer.write(row)
我收到此错误:
Traceback (most recent call last):
File "E:/Thesis/DataDownload/PTDataDownload/demo.py", line 9, in <module>
for row in json.loads(infile.read()):
MemoryError
我确定这与文件的大小有关。有没有办法确保文件转换为CSV而没有错误?
这是我的JSON代码示例:
{"id":"tag:search.twitter.com,2005:905943958144118786","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:899030045234167808","link":"http://www.twitter.com/NAJajsjs3","displayName":"NAJajsjs","postedTime":"2017-08-19T22:07:20.000Z","image":"https://pbs.twimg.com/profile_images/905943685493391360/2ZavxLrD_normal.jpg","summary":null,"links":[{"href":null,"rel":"me"}],"friendsCount":23,"followersCount":1,"listedCount":0,"statusesCount":283,"twitterTimeZone":null,"verified":false,"utcOffset":null,"preferredUsername":"NAJajsjs3","languages":["tr"],"favoritesCount":106},"verb":"post","postedTime":"2017-09-08T00:00:45.000Z","generator":{"displayName":"Twitter for iPhone","link":"http://twitter.com/download/iphone"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/NAJajsjs3/statuses/905943958144118786","body":"@thugIyfe Beyonce do better","object":{"objectType":"note","id":"object:search.twitter.com,2005:905943958144118786","summary":"@thugIyfe Beyonce do better","link":"http://twitter.com/NAJajsjs3/statuses/905943958144118786","postedTime":"2017-09-08T00:00:45.000Z"},"inReplyTo":{"link":"http://twitter.com/thugIyfe/statuses/905942854710775808"},"favoritesCount":0,"twitter_entities":{"hashtags":[],"user_mentions":[{"screen_name":"thugIyfe","name":"dari.","id":40542633,"id_str":"40542633","indices":[0,9]}],"symbols":[],"urls":[]},"twitter_filter_level":"low","twitter_lang":"en","display_text_range":[10,27],"retweetCount":0,"gnip":{"matching_rules":[{"tag":null,"id":6134817834619900217,"id_str":"6134817834619900217"}]}}
(抱歉丑陋的格式化)
另一种选择可能是我有大约8000个较小的json文件,我组合起来制作这个文件。它们都在自己的文件夹中,只有文件夹中的单个json。将这些中的每一个单独转换然后将它们组合成一个csv会更容易吗?
我问这个的原因是因为我有非常基本的python知识,而且我发现的类似问题的所有答案都比我能理解的要复杂得多。请帮助这个新的python用户将此json读作csv!
答案 0 :(得分:1)
单独转换这些内容然后将它们组合成一个csv会更容易吗?
是的,肯定会
例如,这会将每个JSON对象/数组(从文件加载的任何内容)放到单个CSV的自己的行上。
import json, csv
from glob import glob
with open('out.csv', 'w') as f:
for fname in glob("*.json"): # Reads all json from the current directory
with open(fname) as j:
f.write(str(json.load(j)))
f.write('\n')
使用glob模式**/*.json
查找嵌套文件夹中的所有json文件
由于您没有数组,因此不清楚for row in ...
对您的数据做了什么。除非您希望每个JSON密钥都是CSV列?
答案 1 :(得分:0)
是的,绝对可以以非常简单的方式完成。我在几秒钟内打开了一个4GB的json文件。对我来说,我不需要转换为csv。但它可以很容易地完成。
运行mongoimport命令
docker exec -it container_id mongoimport --db test --collection data --file /tmp/data.json --jsonArray
运行mongo export命令导出到csv
docker exec -it container_id mongoexport --db test --collection data --csv --out data.csv --fields id,objectType