我的mongo游标看起来像这样:
{
"_id":ObjectId("57558ee01807ce2f774569cc"),
"description": "Lorem Ipnsun ....",
"results":[
{
"name":"Alica James",
"gender":"male"
},
{
"name":"Alica James",
"gender":"female"
},
{
"name":"Alica James",
"gender":"female"
}
]
},
{
"_id":ObjectId("57558ee01807ce2f774569c6"),
"description": "Lorem Ipnsun ....",
"results":[
{
"name":"Van Ban",
"gender":"unclear"
}
]
},
{
"_id":ObjectId("57558ee01807ce2f774569c7"),
"description": "Lorem Ipnsun ....",
"results":[]
}
如您所见,results
键可以为空或可以包含值。在其中,有一个字段名称,其存在性别可以是男性女性或不清楚。
我想在我的收藏中找到所有文件,然后搜索每个文件,检查每个名字的性别分布。
因此,对于名称"Alica James"
,我希望我的查询得到
female_numbers_for_document = 2
male_numbers_for_document = 1
unclear_numbers_for_document = 0
Van Ban
:
female_numbers_for_document = 0
male_numbers_for_document = 0
unclear_numbers_for_document = 1
在python上,我开始这样做,首先我找到了所有关于集合的文档,然后我开始迭代光标中的每个文档,然后我宣布了一些vars来定义性别,但这不起作用,因为它需要只有第一个值,并没有经过results
。代码如下所示:
def find_gender_distribution(self):
cursor = self.mongo.db[self.collection_name].find()
for document in cursor:
female_numbers_for_document = document.find({"results.gender": "female"}).count()
male_numbers_for_document = document.find({"results.gender": "male"}).count()
unclear_numbers_for_document = document.find({"results.gender": "unclear"}).count()
我不知道如何计算包含相同性别的结果中有多少文档?请帮忙。
答案 0 :(得分:0)
您使用了错误的方法来执行此操作。您需要使用.aggregate()
方法来访问聚合管道。
unwind1 = {"$unwind": "$result"}
group1 = {
"$group": {
"_id": {"name": "$result.name", "gender": "$result.gender"},
"count": {"$sum": 1}
}
}
group2 = {
"$group": {
"_id": "$_id.name",
"nmale": {
"$sum": {"$cond": [
{"$eq": ["$_id.gender", "male"]},
"$count",
0
]
}
},
"nfemale": {
"$sum": {"$cond": [
{"$eq": ["$_id.gender", "female"]},
"$count",
0
]
}
},
"nunclear": {
"$sum": {"$cond": [
{"$or": [
{"$ne": ["$_id.gender", "male"]},
{"$ne": ["$_id.gender", "female"]}
]},
"$count",
0
]
}
}
}
}
pipeline = [unwind1, group1, group2]
def find_gender_distribution(self):
collection = self.mongo.db[self.collection_name]
cursor = collection.aggregate(pipeline)
for document in cursor:
print(document) # or do something
如果我们打印光标,它会产生如下内容:
{ "_id" : "Alica James", "nmale" : 1, "nfemale" : 2, "nunclear" : 3 }
{ "_id" : "Van Ban", "nmale" : 0, "nfemale" : 0, "nunclear" : 1 }