mongodb中嵌套文档的聚合

时间:2015-06-24 10:24:29

标签: mongodb pymongo aggregation-framework

我有一份文件格式如下:

"summary":{
     "HUL":{
        "hr_0":{
            "ts":None,
            "Insights":{
                "sentiments":{
                    "pos":37,
                    "neg":3,
                    "neu":27
                    },
                "topics":[
                    "Basketball",
                    "Football"
                    ],
                "geo":{
                    "locations":{
                        "Delhi":34,
                        "Kolkata":56,
                        "Pune":79,
                        "Bangalore":92,
                        "Mumbai":54
                        },
                    "mst_act":{
                        "loc":Bangalore, 
                        "lat_long":None
                        }
                    }
                }
            },
        "hr_1":{....},
        "hr_2":{....},
         .
         .
        "hr_23":{....}

我想在pymongo中运行一个聚合,总结一天中所有时段的pos,neg和neu情绪" hr_0"到" hr_23"。

我在构造管道命令时遇到了麻烦,因为我感兴趣的字段是嵌套字典。非常感谢你的建议。

谢谢!

1 个答案:

答案 0 :(得分:2)

很难想出一个聚合管道,它会为您提供所需的聚合,因为您的文档架构有一些动态键,您不能将其用作组中的标识表达式运营商管道。 但是,使用当前模式的变通方法是迭代查找光标并提取要在循环中添加的值。如下所示:

pos_total = 0
neg_total = 0
neu_total = 0

cursor = db.collection.find()

for doc in cursor:          
    for i in range(0, 24):
        pos_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["pos"]
        neg_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neg"]
        neu_total += ddoc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neu"]

print(pos_total)
print(neg_total)
print(neu_total)

如果您可以更改架构,那么以下架构将是使用聚合框架的理想选择:

{
    "summary": {
        "HUL": [
            {
                "_id": "hr_0",          
                "ts": None,
                "Insights":{
                    "sentiments":{
                        "pos":37,
                        "neg":3,
                        "neu":27
                    },
                    "topics":[
                        "Basketball",
                        "Football"
                    ],
                    "geo":{
                        "locations":{
                            "Delhi":34,
                            "Kolkata":56,
                            "Pune":79,
                            "Bangalore":92,
                            "Mumbai":54
                        },
                        "mst_act":{
                            "loc":Bangalore, 
                            "lat_long":None
                        }
                    }
                }
            },
            {
                "_id": "hr_2",          
                "ts": None,
                "Insights":{
                    "sentiments":{
                        "pos":37,
                        "neg":3,
                        "neu":27
                    },
                    ...
                }
            },
            ...
            {
                "_id": "hr_23",         
                "ts": None,
                "Insights":{
                    "sentiments":{
                        "pos":37,
                        "neg":3,
                        "neu":27
                    },
                    ...
                }
            }
        ]
    }
}

为您提供所需总计的聚合管道是:

var pipeline = [
    {
        "$unwind": "$summary.HUL"
    },
    {
        "$group": {
            "_id": "$summary.HUL._id",
            "pos_total": { "$sum": "$summary.HUL.Insights.sentiments.pos" },
            "neg_total": { "$sum": "$summary.HUL.Insights.sentiments.neg" },
            "neu_total": { "$sum": "$summary.HUL.Insights.sentiments.neu" },
        }
    }
]

result = db.collection.aggregate(pipeline)