从文件中读取密钥的值并将随机数量输出到文件中?

时间:2017-04-30 17:16:54

标签: python arrays json parsing

我有十几个具有以下结构的文件:

{"reviewerID": "A4IL0CLL27Q33", "asin": "104800001X", "reviewerName": "D. Brennan", "helpful": [0, 1], "reviewText": "I hate it when my shirt collars, not otherwise secured in place by buttons, end up in weird places throughout the day. I purchased some steel collar stays to use with these magnets but they were only vaguely magnetic. I ended up using 2 of these magnets - one in the collar with the stay and the other inside my shirt, to lock my collar in place. They work flawlessly. They are the perfect size, and there are plenty of magnets in case you forget to remove them at the end of the day.", "overall": 5.0, "summary": "Perfect for collar stay management", "unixReviewTime": 1390953600, "reviewTime": "01 29, 2014"}
{"reviewerID": "A3Q5W5E7TDVLJF", "asin": "104800001X", "reviewerName": "funnyc130", "helpful": [0, 0], "reviewText": "These little magnets are really powerful for there size. I am using them to make secret compartments in custom made boxes. Each one hols about .8 of a pound.", "overall": 5.0, "summary": "Neat", "unixReviewTime": 1369958400, "reviewTime": "05 31, 2013"}

每个文件都包含数十万行这样的行。

如何从所有文件中随机获取密钥reviewText的1000个值? 最终输出将保存在文本文件中,每行将包含一个reviewText值。

以下是获取包含亚马逊评论的大文件的来源:
http://jmcauley.ucsd.edu/data/amazon/

以上样本来自此档案:
http://snap.stanford.edu/data/amazon/productGraph/categoryFiles/reviews_Tools_and_Home_Improvement_5.json.gz

1 个答案:

答案 0 :(得分:1)

如果不要求使用python,可以使用jq来解析命令行上的json,然后从中随机选择1000行。

jq '.reviewText' reviews*.json | shuf | head -n1000