在mongodb集合中查找文档对

时间:2016-03-15 22:02:58

标签: mongodb mongodb-query aggregation-framework

我收集了以下结构的文件:

id: ObjectId
name: String
placeSeen: String
dateTimeSeen: Date

我需要通过匹配代表“旅行”的name找到文档对。目标是查看从一个点到另一个点的旅行时间。人可以从任何地方到任何他们想要的地方。

e.g。 (使用下面的示例数据):我需要有结果,以便更容易获得如下信息: “约翰从A1到B1,他花了2分钟.John从B1到C1,他花了2分钟.John从C1到A1,他花了3分钟”

目前我正在考虑通过迭代完整的集合来做到这一点;对于每个文档的name字段,我可以搜索第一个匹配的name,其中placeSeendateTimeSeen升序排序。它会有点工作,但它似乎并不真正有效 - 许多行要迭代。

什么是更好的方法?

示例数据:

{ 
    "_id" : ObjectId("56e933a186983c6f2978e8a1"), 
    "name" : "John", 
    "placeSeen" : "A1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:25:41.000+0000")
}
{ 
    "_id" : ObjectId("56e9354486983c6f2978e8a9"), 
    "name" : "John", 
    "placeSeen" : "B1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:27:41.000+0000")
}
{ 
    "_id" : ObjectId("56e9355186983c6f2978e8ab"), 
    "name" : "John", 
    "placeSeen" : "C1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:29:41.000+0000")
}
{ 
    "_id" : ObjectId("56e9355186983c6f2978e8ac"), 
    "name" : "John", 
    "placeSeen" : "A1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:32:41.000+0000")
}
{ 
    "_id" : ObjectId("56e9358186983c6f2978e8ad"), 
    "name" : "Sue", 
    "placeSeen" : "B1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:21:41.000+0000")
}
{ 
    "_id" : ObjectId("56e9358c86983c6f2978e8af"), 
    "name" : "Sue", 
    "placeSeen" : "A1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:24:41.000+0000")
}
{ 
    "_id" : ObjectId("56e9359686983c6f2978e8b1"), 
    "name" : "Sue", 
    "placeSeen" : "C1", 
    "dateTimeSeen" : ISODate("2016-03-16T10:29:41.000+0000")
}

1 个答案:

答案 0 :(得分:1)

您可以通过聚合执行此操作。关键是弄清楚如何配对日期/地点,按每个人分组是很容易的部分。

我使用了您的示例数据,但我为“Sue”添加了另一个数据点,这是她以前访问过的地方 - 这表明只要正确检查时间,它就可以重复访问。

db.went.find({},{_id:0})
{ "name" : "John", "placeSeen" : "A1", "dateTimeSeen" : ISODate("2016-03-16T10:25:41Z") }
{ "name" : "John", "placeSeen" : "B1", "dateTimeSeen" : ISODate("2016-03-16T10:27:41Z") }
{ "name" : "John", "placeSeen" : "C1", "dateTimeSeen" : ISODate("2016-03-16T10:29:41Z") }
{ "name" : "Sue", "placeSeen" : "B1", "dateTimeSeen" : ISODate("2016-03-16T10:21:41Z") }
{ "name" : "Sue", "placeSeen" : "A1", "dateTimeSeen" : ISODate("2016-03-16T10:24:41Z") }
{ "name" : "Sue", "placeSeen" : "C1", "dateTimeSeen" : ISODate("2016-03-16T10:29:41Z") }
{ "name" : "Sue", "placeSeen" : "B1", "dateTimeSeen" : ISODate("2016-03-16T10:35:00Z") }
{ "name" : "John", "placeSeen" : "A1", "dateTimeSeen" : ISODate("2016-03-16T10:32:41Z") }

这是聚合:

db.went.aggregate( [
    /* we want time to be sorted for each person in the next step */
    {$sort:{name:1,dateTimeSeen:1}}, 
    /* group each person's places and times into a single document */
    {$group:{ _id:"$name", places:{$push:{place:"$placeSeen",time:"$dateTimeSeen"}}}},
    /* this duplicates the "places" arrays into identical field "trips" */
    {$project:{trips:"$places",places:1}},
    /* unwind one of the arrays */
    {$unwind:"$places"},
    /* $filter keeps only elements of "trips" that are "later" than "place", 
     * then we only want the first element of remaining ones */ 
    {$project:{ "places":1, 
                "to": {$arrayElemAt:[ 
                   {$filter {
                      input:"$trips",
                      as:"trip",
                      cond:{$and:[
                          {$ne:["$places.place","$$trip.place"], 
                          {$lt:["$places.time","$$trip.time"]}
                      ]}
                   }},
                   0
                ]}
    }},
    /* if "to" is null then it's the last point (no destination, remove) */
    {$match:{to:{$ne:null}}}, 
    /* format the "trip" output and calculate duration */
    {$project:{ _id:0, 
                name:"$_id",
                trip:{$concat:["$places.place","-","$to.place"]},
                durationSeconds:{$divide:[{$subtract:["$to.time","$places.time"]},1000]}
    }}
] )

输出:

{ "name" : "Sue", "trip" : "B1-A1", "durationSeconds" : 180 }
{ "name" : "Sue", "trip" : "A1-C1", "durationSeconds" : 300 }
{ "name" : "Sue", "trip" : "C1-B1", "durationSeconds" : 319 }
{ "name" : "John", "trip" : "A1-B1", "durationSeconds" : 120 }
{ "name" : "John", "trip" : "B1-C1", "durationSeconds" : 120 }
{ "name" : "John", "trip" : "C1-A1", "durationSeconds" : 180 }

您必须使用3.2.x或更高版本 - 我正在使用3.2.0中引入的几个聚合表达式。