Mongodb - 如何查找重叠间隔的记录?

时间:2015-10-24 20:57:16

标签: mongodb mongodb-query

我在Mongodb集合中有近7000万条记录,其中包括字段(等等)

start: 13653506610,
finish: 13653506650  

(这些值是Unix纪元秒,如果重要的话)。对于从集合开始到集合结束的每30秒间隔,我想查找并聚合记录,重叠间隔,包括每个重叠的时间长度。问题是如何最好地做到这一点?

我创建了一个表单

的索引
db.coll.ensureIndex({start: 1, finish: 1})

但即使使用此索引也可以查询表单

db.coll.find({start: {$lt: 13653506630}, finish: {$gte: 13653506600}})   

需要两分钟。必须有更好的方法!

1 个答案:

答案 0 :(得分:1)

这很有趣 - 谢谢你的问题。

注意:此答案仅查找期间与评估间隔(问题的底部)相交的文档。这将是聚合管道中的一个步骤,可以完成问题的顶部 - 这是一个非常大的问题。你需要有一个更完整的问题才能得到充分的回答。

我注意到您的查询逻辑与您的描述不完全匹配,因此我尝试猜测您要执行的操作并构建测试用例。

您应该能够打开mongo shell use timeSeries,然后将其粘贴以验证概念。最后几行显示了如何调试70,000,000个文档案例 - 包括索引覆盖率和执行时间。

注意:mongo-hacker可以更轻松地检查这种输出。

// USE:
//   mongo timeSeries < thisFile

// Clean out previous runs during testing
db.timeSeries1.drop()

// Given a start/finish 30 sec interval, find all documents that were
// active at that time.

// timeSeries1 holds period in epoch seconds the session was active
// Index start and finish independently - our queries use them independently
db.timeSeries1.ensureIndex({start:1})
db.timeSeries1.ensureIndex({finish:1})

// ASSUME: intervals do not overlap [0,29] and [30,59]
var intervalStart = 13653506600;
var intervalFinish = 13653506629;

// Use cases - should find all 5
//  1. active session matches interval exactly
db.timeSeries1.insert({_id:1, start:intervalStart, finish:intervalFinish})
//  2. active session starts and ends within interval
db.timeSeries1.insert({_id:2, start:intervalStart+5, finish:intervalFinish-5})
//  3. active session starts before interval and ends during interval
db.timeSeries1.insert({_id:3, start:intervalStart-5, finish:intervalFinish-5})
//  4. active session starts during interval and ends after interval
db.timeSeries1.insert({_id:4, start:intervalStart+5, finish:intervalFinish+5})
//  5. active session starts before interval and ends after interval
db.timeSeries1.insert({_id:5, start:intervalStart-5, finish:intervalFinish+5})

// Query should return docs if:
//  the interval is within the active session
//  the active session begins or ends within the interval
//    the active session is within the interval - special 'and' case of above
//
var query = {
  $or: [
    {start: {$gte: intervalStart, $lte: intervalFinish}},
    {finish: {$gte: intervalStart, $lte: intervalFinish}},
    {$and: [
      {start: {$lt: intervalStart}},
      {finish: {$gt: intervalFinish}}
    ]}
  ]
}

// Verify all 5 use cases found
db.timeSeries1.find(query)

// Verify index coverage - each stage is an IXSCAN
db.timeSeries1.explain().find(query)

// Verify that executionStats nReturned is not much more than
// totalKeysExamined.
// Examine execution times
db.timeSeries1.explain("executionStats").find(query)