Python,将mongodump的bson输出转换为json对象数组(字典)

时间:2015-12-16 19:10:50

标签: python json mongodb pymongo bson

我使用mongodump命令转储了一个mongodb集合。输出是一个转储目录,其中包含以下文件:

dump/
    |___coll.bson
    |___coll.metadata.json

如何将导出的文件打开到在python中工作的字典数组? 我尝试了以下但没有工作:

with open('dump/coll.bson', 'rb') as f:
    coll_raw = f.read()
import json
coll = json.loads(coll_raw)

# Using pymongo
from bson.json_util import loads
coll = loads(coll_raw)

ValueError: No JSON object could be decoded

2 个答案:

答案 0 :(得分:4)

你应该尝试:

from bson import BSON
with open('dump/coll.bson', 'rb') as f:
    coll_raw = f.read()

coll = bson.decode_all(coll_raw) 

答案 1 :(得分:0)

我知道很久以前就已经回答了这个问题,但您可以尝试单独解码每个文档,然后您就知道哪个文档导致了问题。

我使用此库:https://github.com/bauman/python-bson-streaming

from bsonstream import KeyValueBSONInput
f = open("restaurants.bson", 'rb')
stream = KeyValueBSONInput(fh=f)
for dict_data in stream:
    print dict_data
f.close()

我看到25359条记录似乎都解码为:

{u'_id': ObjectId('5671bb2e111bb7b9a7ce4d9a'),
 u'address': {u'building': u'351',
              u'coord': [-73.98513559999999, 40.7676919],
              u'street': u'West   57 Street',
              u'zipcode': u'10019'},
 u'borough': u'Manhattan',
 u'cuisine': u'Irish',
 u'grades': [{u'date': datetime.datetime(2014, 9, 6, 0, 0),
              u'grade': u'A',
              u'score': 2},
             {u'date': datetime.datetime(2013, 7, 22, 0, 0),
              u'grade': u'A',
              u'score': 11},
             {u'date': datetime.datetime(2012, 7, 31, 0, 0),
              u'grade': u'A',
              u'score': 12},
             {u'date': datetime.datetime(2011, 12, 29, 0, 0),
              u'grade': u'A',
              u'score': 12}],
 u'name': u'Dj Reynolds Pub And Restaurant',
 u'restaurant_id': u'30191841'}