Mongodb:获取文档是否是具有字段值的最新文档并对结果进行过滤

时间:2016-12-16 10:41:37

标签: mongodb mongodb-query aggregation-framework

我正在尝试将现有的SQL架构移植到Mongo中 我们有文档表,有时几次是相同的文档,具有不同的版本但是相同的参考。我想只获得最新的文件修订版。

示例输入数据:

{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC305",
    "code" : "305-D",
    "title" : "Document 305",
    "creationdate" : ISODate("2011-11-24T15:13:28.887Z"),
    "creator" : "X"
},
{
    "Uid" : "xxx",
    "status" : "COMMENTED",
    "reference" : "DOC306",
    "code" : "306-A",
    "title" : "Document 306",
    "creationdate" : ISODate("2011-11-28T07:23:18.807Z"),
    "creator" : "X"
},
{
    "Uid" : "xxx",
    "status" : "COMMENTED",
    "reference" : "DOC306",
    "code" : "306-B",
    "title" : "Document 306",
    "creationdate" : ISODate("2011-11-28T07:26:49.447Z"),
    "creator" : "X"
},
{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC501",
    "code" : "501-A",
    "title" : "Document 501",
    "creationdate" : ISODate("2011-11-19T06:30:35.757Z"),
    "creator" : "X"
},
{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC501",
    "code" : "501-B",
    "title" : "Document 501",
    "creationdate" : ISODate("2011-11-19T06:40:32.957Z"),
    "creator" : "X"
}

鉴于这些数据,我想要这个结果集(有时我只想要最后一次修订,有时我希望所有带有属性的修订版告诉我它是否是最新版本):

{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC305",
    "code" : "305-D",
    "title" : "Document 305",
    "creationdate" : ISODate("2011-11-24T15:13:28.887Z"),
    "creator" : "X",
    "lastrev" : true
},
{
    "Uid" : "xxx",
    "status" : "COMMENTED",
    "reference" : "DOC306",
    "code" : "306-B",
    "title" : "Document 306",
    "creationdate" : ISODate("2011-11-28T07:26:49.447Z"),
    "creator" : "X",
    "lastrev" : true
},
{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC501",
    "code" : "501-B",
    "title" : "Document 501",
    "creationdate" : ISODate("2011-11-19T06:40:32.957Z"),
    "creator" : "X",
    "lastrev" : true
}

我已经有一堆过滤器,排序和跳过/限制(用于数据分页),因此最终结果集应该注意这些约束。

当前的“查找”查询(使用.Net驱动程序构建),它过滤得很好但是给了我每个文档的所有修订版本:

coll.find(
    { "$and" : [
        { "$or" : [
            { "deletedid" : { "$exists" : false } },
            { "deletedid" : null }
        ] },
        { "$or" : [
            { "taskid" : { "$exists" : false } },
            { "taskid" : null }
        ] },
        { "objecttypeuid" : { "$in" : ["xxxxx"] } }
    ] },
    { "_id" : 0, "Uid" : 1, "lastrev" : 1, "title" : 1, "code" : 1, "creator" : 1, "owner" : 1, "modificator" : 1, "status" : 1, "reference": 1, "creationdate": 1 }
).sort({ "creationdate" : 1 }).skip(0).limit(10);

使用another question,我已经能够构建这个聚合,它为我提供了每个文档的最新版本,但结果中没有足够的属性:

coll.aggregate([
    { $sort: { "creationdate": 1 } },
    {
        $group: {
            "_id": "$reference",
            result: { $last: "$creationdate" },
            creationdate: { $last: "$creationdate" }
        }
    }
]);

我想将聚合与查询查询集成。

2 个答案:

答案 0 :(得分:0)

我找到了混合聚合和过滤的方法:

coll.aggregate(
[
    { $match: {
            "$and" : [
                { "$or" : [
                    { "deletedid" : { "$exists" : false } },
                    { "deletedid" : null }
                ] },
                { "$or" : [
                    { "taskid" : { "$exists" : false } },
                    { "taskid" : null }
                ] },
                { "objecttypeuid" : { "$in" : ["xxx"] } }
            ]
        }
    },
    { $sort: { "creationdate": 1 } },
    { $group: {
            "_id": "$reference",
            "doc": { "$last": "$$ROOT" }
        }
    },
    { $sort: { "doc.creationdate": 1 } },
    { $skip: skip },
    { $limit: limit }
],
    { allowDiskUse: true }
);

对于每个结果节点,这给了我一个" doc"带有文档数据的节点。它仍有太多数据(它缺少预测),但这只是一个开始。

.Net翻译:

FilterDefinitionBuilder<BsonDocument> filterBuilder = Builders<BsonDocument>.Filter;
FilterDefinition<BsonDocument> filters = filterBuilder.Empty;

filters = filters & (filterBuilder.Not(filterBuilder.Exists("deletedid")) | filterBuilder.Eq("deletedid", BsonNull.Value));
filters = filters & (filterBuilder.Not(filterBuilder.Exists("taskid")) | filterBuilder.Eq("taskid", BsonNull.Value));
foreach (var f in fieldFilters) {
    filters = filters & filterBuilder.In(f.Key, f.Value);
}

var sort = Builders<BsonDocument>.Sort.Ascending(orderby);

var group = new BsonDocument {
    { "_id", "$reference" },
    { "doc", new BsonDocument("$last", "$$ROOT") }
};

var aggregate = coll.Aggregate(new AggregateOptions { AllowDiskUse = true })
    .Match(filters)
    .Sort(sort)
    .Group(group)
    .Sort(sort)
    .Skip(skip)
    .Limit(rows);

return aggregate.ToList();

我很确定有更好的方法可以做到这一点。

答案 1 :(得分:0)

你回答非常接近。 <{3}}取代$last,而不是$max

关于$ last operator:

  

返回将表达式应用于按字段共享同一组的一组文档中的最后一个文档所产生的值。仅在文档按照定义的顺序时才有意义。

获取每个组中的最新修订版,请参阅mongo shell中的以下代码:

db.collection.aggregate([
  {
    $group: {
      _id: '$reference',
      doc: {
        $max: {
          "creationdate" : "$creationdate",
          "code" : "$code",
          "Uid" : "$Uid",
          "status" : "$status",
          "title" : "$title",
          "creator" : "$creator"
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      Uid: "$doc.Uid",
      status: "$doc.status",
      reference: "$_id",
      code: "$doc.code",
      title: "$doc.title",
      creationdate: "$doc.creationdate",
      creator: "$doc.creator"
    }
  }
]).pretty()

您期望的输出:

{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC501",
    "code" : "501-B",
    "title" : "Document 501",
    "creationdate" : ISODate("2011-11-19T06:40:32.957Z"),
    "creator" : "X"
}
{
    "Uid" : "xxx",
    "status" : "COMMENTED",
    "reference" : "DOC306",
    "code" : "306-B",
    "title" : "Document 306",
    "creationdate" : ISODate("2011-11-28T07:26:49.447Z"),
    "creator" : "X"
}
{
    "Uid" : "xxx",
    "status" : "ACCEPTED",
    "reference" : "DOC305",
    "code" : "305-D",
    "title" : "Document 305",
    "creationdate" : ISODate("2011-11-24T15:13:28.887Z"),
    "creator" : "X"
}