我在Mongodb集合中有近7000万条记录,其中包括字段(等等)
start: 13653506610,
finish: 13653506650
(这些值是Unix纪元秒,如果重要的话)。对于从集合开始到集合结束的每30秒间隔,我想查找并聚合记录,重叠间隔,包括每个重叠的时间长度。问题是如何最好地做到这一点?
我创建了一个表单
的索引db.coll.ensureIndex({start: 1, finish: 1})
但即使使用此索引也可以查询表单
db.coll.find({start: {$lt: 13653506630}, finish: {$gte: 13653506600}})
需要两分钟。必须有更好的方法!
答案 0 :(得分:1)
这很有趣 - 谢谢你的问题。
注意:此答案仅查找期间与评估间隔(问题的底部)相交的文档。这将是聚合管道中的一个步骤,可以完成问题的顶部 - 这是一个非常大的问题。你需要有一个更完整的问题才能得到充分的回答。
我注意到您的查询逻辑与您的描述不完全匹配,因此我尝试猜测您要执行的操作并构建测试用例。
您应该能够打开mongo shell use timeSeries
,然后将其粘贴以验证概念。最后几行显示了如何调试70,000,000个文档案例 - 包括索引覆盖率和执行时间。
注意:mongo-hacker可以更轻松地检查这种输出。
// USE:
// mongo timeSeries < thisFile
// Clean out previous runs during testing
db.timeSeries1.drop()
// Given a start/finish 30 sec interval, find all documents that were
// active at that time.
// timeSeries1 holds period in epoch seconds the session was active
// Index start and finish independently - our queries use them independently
db.timeSeries1.ensureIndex({start:1})
db.timeSeries1.ensureIndex({finish:1})
// ASSUME: intervals do not overlap [0,29] and [30,59]
var intervalStart = 13653506600;
var intervalFinish = 13653506629;
// Use cases - should find all 5
// 1. active session matches interval exactly
db.timeSeries1.insert({_id:1, start:intervalStart, finish:intervalFinish})
// 2. active session starts and ends within interval
db.timeSeries1.insert({_id:2, start:intervalStart+5, finish:intervalFinish-5})
// 3. active session starts before interval and ends during interval
db.timeSeries1.insert({_id:3, start:intervalStart-5, finish:intervalFinish-5})
// 4. active session starts during interval and ends after interval
db.timeSeries1.insert({_id:4, start:intervalStart+5, finish:intervalFinish+5})
// 5. active session starts before interval and ends after interval
db.timeSeries1.insert({_id:5, start:intervalStart-5, finish:intervalFinish+5})
// Query should return docs if:
// the interval is within the active session
// the active session begins or ends within the interval
// the active session is within the interval - special 'and' case of above
//
var query = {
$or: [
{start: {$gte: intervalStart, $lte: intervalFinish}},
{finish: {$gte: intervalStart, $lte: intervalFinish}},
{$and: [
{start: {$lt: intervalStart}},
{finish: {$gt: intervalFinish}}
]}
]
}
// Verify all 5 use cases found
db.timeSeries1.find(query)
// Verify index coverage - each stage is an IXSCAN
db.timeSeries1.explain().find(query)
// Verify that executionStats nReturned is not much more than
// totalKeysExamined.
// Examine execution times
db.timeSeries1.explain("executionStats").find(query)