我正在尝试将mongodb集合转换为json文件,稍后将相同的Json文件数据加载到另一个MongoDB集合。该系列有大约60,000行。我写了以下代码:
from pymongo import MongoClient
import json
from bson.json_util import dumps
from bson import json_util
with open("collections/review.json", "w") as f:
l = list(reviews_collection.find())
json.dump(json.dumps(l,default=json_util.default),f,indent = 4)
# reviews_collection_bkp.remove()
reviews_collection_bkp.remove()
with open("collections/review.json") as dataset:
for line in dataset:
data = json.loads(line)
reviews_collection_bkp.insert({
"reviewId": data["reviewId"],
"business": data["business"],
"text": data["text"],
"stars": data['stars'],
"votes":data["votes"]
})
print reviews_collection_bkp.find().count()
review_collection
是我想用Json文件名review.json
编写的集合,后来想要从同一个文件中读取数据以将数据插入到MongoDB集合中。但我认为代码无法创建正确的json文件。因为在读取同一文件时会产生以下错误:
"reviewId": data["reviewId"],
TypeError: string indices must be integers
为什么创建的Json文件格式不正确?
这是line
和data
的示例输出:
"[{\"votes\": {\"funny\": 0, \"useful\": 0, \"cool\": 0}, \"business\": \"wqu7ILomIOPSduRwoWp4AQ\", \"text\": \"Went for breakfast on 6/16/14. We received very good service and meal came within a few minutes.Waitress could have smiled more but was friendly. \\nI had a Grand Slam... it was more than enough food. \\nMeal was very tasty... We will definitely go back. \\nIt is a popular Denny's.\", \"reviewId\": \"0GS3S7UsRGI4B7ziy4cd7Q\", \"stars\": 4, \"_id\": {\"$oid\": \"5711d16fe396f81fcb51dc73\"}},...]
[{"votes": {"funny": 0, "useful": 0, "cool": 0}, "business": "wqu7ILomIOPSduRwoWp4AQ", "text": "Went for breakfast on 6/16/14. We received very good service and meal came within a few minutes.Waitress could have smiled more but was friendly. \nI had a Grand Slam... it was more than enough food. \nMeal was very tasty... We will definitely go back. \nIt is a popular Denny's.", "reviewId": "0GS3S7UsRGI4B7ziy4cd7Q", "stars": 4, "_id": {"$oid": "5711d16fe396f81fcb51dc73"}}......]
答案 0 :(得分:0)
你确定文件的每一行都是有效的json吗?
我认为这是一种正确的方法:
with open("collections/review.json") as dataset:
data = json.loads(dataset)
for line in data:
reviews_collection_bkp.insert({
"reviewId": line['reviewId'],
...
})
如果这不起作用,请尝试打印生成的json文件,以了解如何解码。
答案 1 :(得分:0)
由于您的数据是字典列表,您需要遍历它。
for line in dataset:
data = json.loads(line)
for doc in data:
reviews_collection_bkp.insert({
"reviewId": data["reviewId"],
"business": data["business"],
"text": data["text"],
"stars": data['stars'],
"votes":data["votes"]
})