如何根据Mongo中的另一个文档字段找到文档的最后一次出现?

时间:2017-08-09 07:53:25

标签: mongodb

让我们先得到一些测试数据:

db.events.insertMany([
{
    datetime: ISODate("2017-01-01 00:00:00"),
    results: [
        { id: 2, note: "SK4 on alarm" },
        { id: 5, note: "UT7 on alarm" }
    ]
},
{
    datetime: ISODate("2017-01-01 00:15:00"),
    results: [
        { id: 2, note: "SK4 on alarm" },
        { id: 5, note: "UT7 on alarm" }
    ]
},
{
    datetime: ISODate("2017-01-01 00:30:00"),
    results: [
        { id: 5, note: "UT7 on alarm" }
    ]
},
{
    datetime: ISODate("2017-01-01 00:45:00"),
    results: []
},
{
    datetime: ISODate("2017-01-01 01:00:00"),
    results: [
        { id: 5, note: "UT7 on alarm" }
    ]
},
{
    datetime: ISODate("2017-01-01 01:15:00"),
    results: []
},
]);

我想为上述事件创建一个甘特图。我的甘特图需要两件事:

  • 条目的标题
  • 条目的开始和结束日期

所以从上面的测试数据我需要生成这样的输出:

[
    { 
        "starttime": "2017-01-01 00:00:00",
        "endtime": "2017-01-01 00:30:00",
        "title": "SK4 on alarm"
    },
    {
        "starttime": "2017-01-01 00:00:00",
        "endtime": "2017-01-01 00:45:00",
        "title": "UT7 on alarm"
    },
    {
        "starttime": "2017-01-01 01:00:00",
        "endtime": "2017-01-01 01:15:00",
        "title": "UT7 on alarm"
    }
]

正如您所看到的,我需要获取正在进行的警报列表,其中包含首次出现的开始日期以及它们消失时的结束日期。 每个警报在历史记录中可能会出现多次,就像" UT7报警"那样。

我需要形成一个mongodb查询,它返回与上面相同的输出。

我可能有理论但不能将其转换为实际的mongo查询。 因此,要确定每个事件的结束日期,我需要编写一个查询,查找在特定日期(事件' s datetime)之后具有最小日期时间的文档,其中给定的id是不存在于results数组中。 这是我不知道该怎么做的步骤。

在这个查询之后我会有这样的事情:

[
    { 
        "starttime": "2017-01-01 00:00:00",
        "endtime": "2017-01-01 00:30:00",
        "title": "SK4 on alarm"
    },
    {
        "starttime": "2017-01-01 00:00:00",
        "endtime": "2017-01-01 00:45:00",
        "title": "UT7 on alarm"
    },
    { 
        "starttime": "2017-01-01 00:15:00",
        "endtime": "2017-01-01 00:30:00",
        "title": "SK4 on alarm"
    },
    {
        "starttime": "2017-01-01 00:15:00",
        "endtime": "2017-01-01 00:45:00",
        "title": "UT7 on alarm"
    },
    {
        "starttime": "2017-01-01 00:30:00",
        "endtime": "2017-01-01 00:45:00",
        "title": "UT7 on alarm"
    },
    {
        "starttime": "2017-01-01 01:00:00",
        "endtime": "2017-01-01 01:15:00",
        "title": "UT7 on alarm"
    }
]

在此步骤之后,我需要过滤掉重复的条目。为此,我的想法是按"title, endtime"对项目进行分组,并获得最低starttime。我想这会给我正确的结果。

这不是作业。

任何帮助将不胜感激!

2 个答案:

答案 0 :(得分:1)

先生,因为你的问题,我冒着风险!我花了几个小时来提出以下问题 - 幸运的是 - 产生了你想要的东西。

坦率地说,我希望它能让聪明人想出更精简的东西。我走向各个方向,但这是我能让这件事发挥作用的唯一方法。

collection.aggregate({
    // first unwind to get one document per event
    $unwind: {
        path: "$results",
        preserveNullAndEmptyArrays: true // we need this in order to keep the dates for the empty ("no results") events
    }
}, {
    // we start building up some lookup data structure which we will need later
    $group: {
        "_id": "$datetime",  // for every date...
        "allResultIds": { $addToSet: "$results.id" }, // ...we want to capture the event ids and exclude duplicates
        "docs": { $push: "$$ROOT" } // ...and keep track of all documents we encounter
    }
}, {
    // we got to make sure that our events are nicely sorted to allow our following stages to work properly
    $sort: { "_id": 1 } // order by datetime
}, {
    // now, we produce the final lookup structure to help us later
    $group: {
        "_id": null, // we do not really want to group but instead group all documents into one
        "magicLookup": {
            $push: { "datetime": "$_id", "allResultIds": "$allResultIds" } // here is where we put the final lookup structure together
        },
        "docs": {
            $push: "$$ROOT.docs" // as always, we want to keep track of all documents
        }
    }
}, {
    $unwind: "$docs" // flatten result
}, {
    $unwind: "$docs" // flatten result again ;)
}, {
    $project: // restore original document structure but this time with the lookup included in every document
    {
        "magicLookup": 1,
        "_id": "$docs._id", // this is not even needed
        "datetime": "$docs.datetime",
        "results": "$docs.results",
    }
}, {
    // let's filter out documents with no results
    $match: {
        "results": { $exists: true }
    }
}, {
    // now, we can find the end date for all our events using the logic you described (first event after the current one without the same result id)
    $project: {
        "datetime": 1, // we want to keep the datetime information
        "results": 1, // the same for the results
        "endtime": {
            $min: { // find the minimum - luckily, this seems to work
                $filter: { // exclude all events for the result id we are looking at just now
                    input: {
                        $slice: [ // look at the documents *after* the current one - this is why we need the sorting stage before
                            "$magicLookup", // from the magic lookup
                            { $add: [ { $indexOfArray: [ "$magicLookup.datetime", "$datetime" ] }, 1 ] }, // we want everything *after* the current event
                            { $size: "$magicLookup" } // up to a maximum of, well, err, the array length - whatever. This could be a hardcoded number or written more beautifully but I couldn't be bothered
                        ]
                    },
                    cond: {
                        $not: {
                            $in: [
                                "$results.id", "$$this.allResultIds"
                            ]
                        }
                    }
                }
            }
        }
    }
}
, {
    // (almost) lastly, we apply a little trick
    $group:
    {
        "_id": { "note": "$results.note", endtime: "$endtime.datetime" },
        "starttime": { $min: "$datetime" } // the smallest of all our event dates with the same same end date is our start date
    }
}, {
    // let's beautify the output a little
    $project: {
        "_id": 0,
        "starttime": "$starttime",
        "endtime": "$_id.endtime",
        "note": "$_id.note",
    }
}
)

答案 1 :(得分:1)

所以,根据dnickless的答案,这非常棒!我想出了这个解决方案。它的长度几乎相同,只是我没有使用{I}我并不熟悉的$filter, $$ROOT, $slice and $addToSet

db.events.aggregate([
    // Make sure everything is in historical order
    { 
        $sort: { datetime: 1 } 
    },
    // Build such a document where $left stores the original documents and $right stores only the event IDs seen at a given time
    {
        $group: {
            _id: null,
            left: { $push: { datetime: "$datetime", events: "$results" } },
            right: { $push: { datetime: "$datetime", events: "$results.id" } }
        }
    },
    // Flatten by $left so each original document will have all the other documents (this is now like a SQL join)
    {
        $unwind: {
            path: "$left",
            preserveNullAndEmptyArrays: true
        }        
    },
    // Flatten by $left.events so that each occured event has its own document tree
    {
        $unwind: "$left.events"
    },
    // Flatten by $right too so that we have a descartes product of each occured event (this is needed for the following filtering)
    {
        $unwind: {
            path: "$right",
            preserveNullAndEmptyArrays: true
        }
    },
    // Calculate fields for filtering: $dategt means if the $right event occured later in time than $left event
    // and $alarmoff means if $left event no longer present in the time when $right is stored
    {
        $project: {
            left: 1,
            right: 1,
            dategt: { $gt: [ "$right.datetime", "$left.datetime" ] },
            alarmoff: { $not: { $in: [ "$left.events.id", "$right.events" ] } }
        }
    },
    // Filter our irrelevant documents
    {
        $match: {
            dategt: { $eq: true },
            alarmoff: { $eq: true }
        }
    },
    // Let's put the documents back together so that each event occured in $left will have the minimum date from $right in which the $left event disappeared
    {
        $group: {
            _id: {
                "datetime": "$left.datetime",
                "id": "$left.events.id",
                "note": "$left.events.note"                
            },
            "right": { $min: "$right.datetime" }
        }
    },
    // We have descending order, so we need to sort again for another grouping
    {
        $sort: {
            "_id.datetime": 1,
            "right": 1            
        }
    },
    // Need to group the documents again but now get the minimum start time of $left for each end time of $right
    {
        $group: {
            "_id": {
                endtime: "$right",
                id: "$_id.id",
                note: "$_id.note"
            },
            starttime: { $min: "$_id.datetime" }
        }
    },
    // Let's beautify the output a little
    {
        $project: {
            _id: 0,
            starttime: 1,
            endtime: "$_id.endtime",
            note: "$_id.note",
        }
    }
])