有了这些数据:
{
"_id" : ObjectId("576948b4999274493425c08a"),
"virustotal" : {
"scan_id" : "4a6c3dfc6677a87aee84f4b629303c40bb9e1dda283a67236e49979f96864078-1465973544",
"sha1" : "fd177b8c50b457dbec7cba56aeb10e9e38ebf72f",
"resource" : "4a6c3dfc6677a87aee84f4b629303c40bb9e1dda283a67236e49979f96864078",
"response_code" : 1,
"scan_date" : "2016-06-15 06:52:24",
"results" : [
{
"sig" : "Gen:Variant.Mikey.29601",
"vendor" : "MicroWorld-eScan"
},
{
"sig" : null,
"vendor" : "nProtect"
},
{
"sig" : null,
"vendor" : "CAT-QuickHeal"
},
{
"sig" : "HEUR/QVM07.1.0000.Malware.Gen",
"vendor" : "Qihoo-360"
}
]
}
},
{
"_id" : ObjectId("5768f214999274362f714e8b"),
"virustotal" : {
"scan_id" : "3d283314da4f99f1a0b59af7dc1024df42c3139fd6d4d4fb4015524002b38391-1466529838",
"sha1" : "fb865b8f0227e9097321182324c959106fcd8c27",
"resource" : "3d283314da4f99f1a0b59af7dc1024df42c3139fd6d4d4fb4015524002b38391",
"response_code" : 1,
"scan_date" : "2016-06-21 17:23:58",
"results" : [
{
"sig" : null,
"vendor" : "Bkav"
},
{
"sig" : null,
"vendor" : "ahnlab"
},
{
"sig" : null,
"vendor" : "MicroWorld-eScan"
},
{
"sig" : "Mal/DrodZp-A",
"vendor" : "Qihoo-360"
}
]
}
}
我正在尝试分组并在sig不为空时对供应商进行计数以获得类似的内容:
{
"_id" : "Qihoo-360",
"count" : 2
},
{
"_id" : "MicroWorld-eScan",
"count" : 1
},
{
"_id" : "Bkav",
"count" : 0
},
{
"_id" : "CAT-QuickHeal",
"count" : 0
}
目前使用此代码:
db.analysis.aggregate([
{ $unwind: "$virustotal.results" },
{
$group : {
_id : "$virustotal.results.vendor",
count : { $sum : 1 }
}
},
{ $sort : { count : -1 } }
])
我得到了一切:
{
"_id" : "Qihoo-360",
"count" : 2
},
{
"_id" : "MicroWorld-eScan",
"count" : 2
},
{
"_id" : "Bkav",
"count" : 1
},
{
"_id" : "CAT-QuickHeal",
"count" : 1
}
如果sig为空,我如何计算0?
答案 0 :(得分:1)
您需要在 $sum
运算符中使用条件表达式,该运算符将使用比较运算符 $gt
<检查"$virustotal.results.sig"
键是否为空/ strong>(如documentation's BSON comparsion order中所述)
您可以通过添加以下表达式来重构您的管道:
db.analysis.aggregate([
{ "$unwind": "$virustotal.results" },
{
"$group" : {
"_id": "$virustotal.results.vendor",
"count" : {
"$sum": {
"$cond": [
{ "$gt": [ "$virustotal.results.sig", null ] },
1, 0
]
}
}
}
},
{ "$sort" : { "count" : -1 } }
])
示例输出
/* 1 */
{
"_id" : "Qihoo-360",
"count" : 2
}
/* 2 */
{
"_id" : "MicroWorld-eScan",
"count" : 1
}
/* 3 */
{
"_id" : "Bkav",
"count" : 0
}
/* 4 */
{
"_id" : "CAT-QuickHeal",
"count" : 0
}
/* 5 */
{
"_id" : "nProtect",
"count" : 0
}
/* 6 */
{
"_id" : "ahnlab",
"count" : 0
}
答案 1 :(得分:0)
我用null更改了null并且数字增加了但似乎还不正确。 基本上在mongoshell中进行查询我得像
{ &#34; _id&#34; :&#34;卡巴斯基&#34;, &#34;计数&#34; :176.0 }
来自python的: 卡巴斯基64
其中一个是错误的:)
所以我试图调查python中查询的哪个部分与mongo shell相比没有正确编写。 我做了一个简单的查询: 在mongoshell: rtmp = results_db.analysis.count({&#34; virustotal.results&#34;:{&#34; $ elemMatch&#34;:{&#34; vendor&#34;:&#34; Kaspersky&#34;, &#34; sig&#34;:{&#34; $ ne&#34;:&#34; null&#34;}}}}}) 结果:176
db.analysis.count({&#34; virustotal.results&#34;:{$ elemMatch:{&#34; vendor&#34;:&#34; Kaspersky&#34;,&#34; sig&# 34;:{$ gt:null}}}}) 结果:0
然后我在python中尝试过: rtmp = results_db.analysis.count({&#34; virustotal.results&#34;:{&#34; $ elemMatch&#34;:{&#34; vendor&#34;:&#34; Kaspersky&#34;, &#34; sig&#34;:{&#34; $ ne&#34;:&#34; null&#34;}}}}}) 结果:568
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$ne": "None"} } }})
结果:568
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$gt": "None"} } }})
结果:64
rtmp = results_db.analysis.count( { "virustotal.results" : { "$elemMatch" : { "vendor": "Kaspersky", "sig": {"$gt": "null"} } }})
结果:6
很难说什么是正确的价值!我想176但不能在python中重现...