我有一份文件格式如下:
"summary":{
"HUL":{
"hr_0":{
"ts":None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
"hr_1":{....},
"hr_2":{....},
.
.
"hr_23":{....}
我想在pymongo中运行一个聚合,总结一天中所有时段的pos,neg和neu情绪" hr_0"到" hr_23"。
我在构造管道命令时遇到了麻烦,因为我感兴趣的字段是嵌套字典。非常感谢你的建议。
谢谢!
答案 0 :(得分:2)
很难想出一个聚合管道,它会为您提供所需的聚合,因为您的文档架构有一些动态键,您不能将其用作组中的标识表达式运营商管道。 但是,使用当前模式的变通方法是迭代查找光标并提取要在循环中添加的值。如下所示:
pos_total = 0
neg_total = 0
neu_total = 0
cursor = db.collection.find()
for doc in cursor:
for i in range(0, 24):
pos_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["pos"]
neg_total += doc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neg"]
neu_total += ddoc["summary"]["HUL"]["hr_"+str(i)]["Insights"]["sentiments"]["neu"]
print(pos_total)
print(neg_total)
print(neu_total)
如果您可以更改架构,那么以下架构将是使用聚合框架的理想选择:
{
"summary": {
"HUL": [
{
"_id": "hr_0",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
"topics":[
"Basketball",
"Football"
],
"geo":{
"locations":{
"Delhi":34,
"Kolkata":56,
"Pune":79,
"Bangalore":92,
"Mumbai":54
},
"mst_act":{
"loc":Bangalore,
"lat_long":None
}
}
}
},
{
"_id": "hr_2",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
},
...
{
"_id": "hr_23",
"ts": None,
"Insights":{
"sentiments":{
"pos":37,
"neg":3,
"neu":27
},
...
}
}
]
}
}
为您提供所需总计的聚合管道是:
var pipeline = [
{
"$unwind": "$summary.HUL"
},
{
"$group": {
"_id": "$summary.HUL._id",
"pos_total": { "$sum": "$summary.HUL.Insights.sentiments.pos" },
"neg_total": { "$sum": "$summary.HUL.Insights.sentiments.neg" },
"neu_total": { "$sum": "$summary.HUL.Insights.sentiments.neu" },
}
}
]
result = db.collection.aggregate(pipeline)