我需要计算持续时间的90%,其中每个文档的持续时间定义为finish_time - start_time
。
我的计划是:
$project
以计算每个文档的持续时间,以秒为单位。90th_percentile_index = 0.9 * amount_of_documents
。$duration
变量对文档进行排序。90th_percentile_index
来$limit
个文档。我是MongoDB
的新手,所以我想可以改进查询。因此,查询看起来像:
db.getCollection('scans').aggregate([
{
$project: {
duration: {
$divide: [{$subtract: ["$finish_time", "$start_time"]}, 1000] // duration is in seconds
},
Percentile90Index: {
$multiply: [0.9, "$total_number_of_documents"] // I don't know how to get the total number of documents..
}
}
},
{
$sort : {"$duration": 1},
},
{
$limit: "$Percentile90Index"
},
{
$group: {
_id: "_id",
percentiles90 : { $max: "$duration" } // selecting the max, i.e, first document after the limit , should give the result.
}
}
])
我遇到的问题是我不知道如何获取total_number_of_documents
,因此无法计算索引。
示例: 假设我只有3个文档:
{
"_id" : ObjectId("1"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:01:00.000Z"),
}
{
"_id" : ObjectId("2"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:03:00.000Z"),
}
{
"_id" : ObjectId("3"),
"start_time" : ISODate("2019-02-03T12:00:00.000Z"),
"finish_time" : ISODate("2019-02-03T12:08:00.000Z"),
}
所以我希望结果是这样的:
{
percentiles50 : 3 // in minutes, since percentiles50=3 is the minimum value that setisfies the request of atleast 50% of the documents have duration <= percentiles50
}
在示例中,我使用了百分位数50,因为我只给出了3个文档,但这并不重要,请向我显示第i个百分位数的查询就可以了,:-)