我正在构建一个Web scraper并尝试为实体分配一个UUID。
由于可能会在不同时间抓取一个实体,我想将初始UUID与从网页中提取的ID一起存储
// example document
{
"ent_eid_type": "ABC-123",
"ent_uid_type": "123e4567-aaa-123e456"
}
下面的是针对在已删除项目中找到的每个id字段运行的代码
# if the current ent_eid_type is a key in mongo...
if db_coll.find({ent_eid_type: ent_eid}).count() > 0:
# return the uid value
ent_uid = db_coll.find({ent_uid_type: ent_uid })
else:
# create a fresh uid
ent_uid = uuid.uuid4()
# store it with the current entity eid as key, and uid as value
db_coll.insert({ent_eid_type: ent_eid, ent_uid_type: ent_uid})
# update the current item with the stored uid for later use
item[ent_uid_type] = ent_uid
控制台正在返回KeyError: <pymongo.cursor.Cursor object at 0x104d41710>
。不确定如何解析ent_uid
任何提示/建议表示赞赏!
答案 0 :(得分:1)
Pymongo Find command returns a cursor object you need to iterate or access to get the object
Access the first result (you already checked one exists), and access the ent_uid field.
Presumably, you're going to search on EID type, with ent_eid not ent_uid. No reason to search if you already have it.
ent_uid = db_coll.find({ent_eid_type: ent_eid })[0]['ent_uid']
or don't worry about the cursor and use the find_one command instead (http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.find_one)
ent_uid = db_coll.find_one({ent_eid_type: ent_eid })['ent_uid']