Question

我有一个查询，我需要返回10个＆＃34;类型A＆＃34;记录，同时返回所有其他记录。我怎么能做到这一点？

更新：不可否认，我可以通过两个查询执行此操作，但我想避免这种情况，如果可能的话，认为它可以减少开销，并且可能更高效。我的查询已经是一个聚合查询，它考虑了两种记录，我只需要限制结果中一种记录的数量。

更新：以下是突出显示问题的示例查询：

db.books.aggregate([
    {$geoNear: {near: [-118.09771, 33.89244], distanceField: "distance", spherical: true}},
    {$match:    {"type": "Fiction"}},
    {$project:  {
        'title': 1,
        'author': 1,
        'type': 1,
        'typeSortOrder': 
            {$add: [
                {$cond: [{$eq: ['$type', "Fiction"]}, 1, 0]},
                {$cond: [{$eq: ['$type', "Science"]}, 0, 0]},
                {$cond: [{$eq: ['$type', "Horror"]}, 3, 0]}
        ]},
    }},
    {$sort: {'typeSortOrder'}},
    {$limit: 10}
])

db.books.aggregate([
    {$geoNear: {near: [-118.09771, 33.89244], distanceField: "distance", spherical: true}},
    {$match:    {"type": "Horror"}},
    {$project:  {
        'title': 1,
        'author': 1,
        'type': 1,
        'typeSortOrder': 
            {$add: [
                {$cond: [{$eq: ['$type', "Fiction"]}, 1, 0]},
                {$cond: [{$eq: ['$type', "Science"]}, 0, 0]},
                {$cond: [{$eq: ['$type', "Horror"]}, 3, 0]}
        ]},
    }},
    {$sort: {'typeSortOrder'}},
    {$limit: 10}
])

db.books.aggregate([
    {$geoNear: {near: [-118.09771, 33.89244], distanceField: "distance", spherical: true}},
    {$match:    {"type": "Science"}},
    {$project:  {
        'title': 1,
        'author': 1,
        'type': 1,
        'typeSortOrder': 
            {$add: [
                {$cond: [{$eq: ['$type', "Fiction"]}, 1, 0]},
                {$cond: [{$eq: ['$type', "Science"]}, 0, 0]},
                {$cond: [{$eq: ['$type', "Horror"]}, 3, 0]}
        ]},
    }},
    {$sort: {'typeSortOrder'}},
    {$limit: 10}
])

我想在一个查询中返回所有这些记录，但是将类型限制为任何类别中的最多10个。我意识到当查询被这样分解时，typeSortOrder不需要是有条件的，我最初在那里查询是一个查询时（我想回到这里）。 / p>

Answer 1

我不认为目前（2.6）可以使用一个聚合管道。很难给出一个关于其原因的精确论据，但基本上聚合管道一次执行一个文档流的转换。在流的状态管道中没有意识到，这是你需要确定你已达到A，B＆B的限制等，并需要删除相同类型的其他文档。 $group确实将多个文档放在一起，并允许其汇总的字段值影响生成的组文档（$sum，$avg等）。也许这是有道理的，但它必然不严谨，因为您可以添加简单的操作，以便可以根据类型进行限制，例如，向$push x添加$group累加器如果被推送的数组少于x个元素，则仅推送该值。

即使我确实有办法，我也建议只进行两次聚合。保持简单。

Answer 2

这是MongoDB不支持的子查询/连接的经典案例。所有联接和子查询类操作都需要在应用程序逻辑中实现。因此，多个查询是您最好的选择。如果您有类型索引，那么多查询方法的性能应该很好。

或者，您可以编写单个聚合查询，减去类型匹配和限制子句，然后在应用程序逻辑中处理流以限制每种类型的文档。对于大型结果集，此方法的性能较低，因为文档可能以随机顺序返回。然后，您的限制逻辑将需要遍历整个结果集。

Answer 3

我猜你可以在游标上使用cursor.limit（）来指定游标返回的最大文档数。 limit（）类似于SQL数据库中的LIMIT语句。在从数据库中检索任何文档之前，必须对游标应用limit（）。

游标中的限制功能可用于限制查找中的记录数。

我想这个例子应该有所帮助：

var myCursor = db.bios.find( );

db.bios.find().limit( 5 )

Answer 4

问题

这里的结果并非不可能，但也可能不切实际。已经做了一般性的注意事项，你不能切片＆＃34;数组或其他＆＃34;限制＆＃34;结果的数量推到一个。并且按照＆＃34;类型＆＃34;执行此操作的方法本质上是使用数组。

＆＃34;不切实际＆＃34; part通常是关于结果的数量，其中太大的结果集会在＆＃34;分组＆＃34;时爆炸BSON文档限制。但是，我将在您的＆＃34; geo search＆＃34;上提出一些其他建议。最终目标是返回每个＆＃34;类型＆＃34;的10个结果。至多。

原理

首先要考虑并理解这个问题，让我们看一下简化的＆＃34; set＆＃34;返回＆＃34;前2个结果所需的数据和管道代码＆＃34;来自每种类型：

{ "title": "Title 1", "author": "Author 1", "type": "Fiction", "distance": 1 },
{ "title": "Title 2", "author": "Author 2", "type": "Fiction", "distance": 2 },
{ "title": "Title 3", "author": "Author 3", "type": "Fiction", "distance": 3 },
{ "title": "Title 4", "author": "Author 4", "type": "Science", "distance": 1 },
{ "title": "Title 5", "author": "Author 5", "type": "Science", "distance": 2 },
{ "title": "Title 6", "author": "Author 6", "type": "Science", "distance": 3 },
{ "title": "Title 7", "author": "Author 7", "type": "Horror", "distance": 1 }

这是数据的简化视图，在某种程度上代表了初始查询后的文档状态。现在就是如何使用聚合管道来获得最近的＆＃34;每个＆＃34;类型＆＃34;的两个结果：

db.books.aggregate([
    { "$sort": { "type": 1, "distance": 1 } },
    { "$group": {
        "_id": "$type",
        "1": { 
            "$first": {
                "_id": "$_id",
                "title": "$title",
                "author": "$author",
                "distance": "$distance"
            }
         },
         "books": {
             "$push": {
                "_id": "$_id",
                "title": "$title",
                "author": "$author",
                "distance": "$distance"
              }
         }
    }},
    { "$project": {
        "1": 1,
        "books": {
            "$cond": [
                { "$eq": [ { "$size": "$books" }, 1 ] },
                { "$literal": [false] },
                "$books"
            ]
        }
    }},
    { "$unwind": "$books" },
    { "$project": {
        "1": 1,
        "books": 1,
        "seen": { "$eq": [ "$1", "$books" ] }
    }},
    { "$sort": { "_id": 1, "seen": 1 } },
    { "$group": {
        "_id": "$_id",
        "1": { "$first": "$1" },
        "2": { "$first": "$books" },
        "books": {
            "$push": {
                "$cond": [ { "$not": "$seen" }, "$books", false ]
            }
        }
    }},
    { "$project": {
        "1": 1,
        "2": 2,
        "pos": { "$literal": [1,2] }
    }},
    { "$unwind": "$pos" },
    { "$group": {
        "_id": "$_id",
        "books": {
            "$push": {
                "$cond": [
                    { "$eq": [ "$pos", 1 ] },
                    "$1",
                    { "$cond": [
                        { "$eq": [ "$pos", 2 ] },
                        "$2",
                        false
                    ]}
                ]
            }
        }
    }},
    { "$unwind": "$books" },
    { "$match": { "books": { "$ne": false } } },
    { "$project": {
        "_id": "$books._id",
        "title": "$books.title",
        "author": "$books.author",
        "type": "$_id",
        "distance": "$books.distance",
        "sortOrder": {
            "$add": [
                { "$cond": [ { "$eq": [ "$_id", "Fiction" ] }, 1, 0 ] },
                { "$cond": [ { "$eq": [ "$_id", "Science" ] }, 0, 0 ] },
                { "$cond": [ { "$eq": [ "$_id", "Horror" ] }, 3, 0 ] }
            ]
        }
    }},
    { "$sort": { "sortOrder": 1 } }
])

当然这只是两个结果，但它概述了获取n结果的过程，这自然是在生成的管道代码中完成的。在进入代码之前，该过程值得一试。

在任何查询之后，这里要做的第一件事是$sort结果，而这要基本上通过＆＃34;分组键＆＃34;这是＆＃34;类型＆＃34;并且通过＆＃34;距离＆＃34;这样＆＃34;最近的＆＃34;物品在上面。

这一点的原因显示在$group阶段，将重复。所做的主要是＆＃34;弹出每个分组堆栈的$first结果。因此，其他文档不会丢失，使用$push将它们放在一个数组中。

为了安全起见，下一阶段实际上只需要在＆＃34;第一步＆＃34;之后，但可以选择添加以便在重复中进行类似的过滤。这里的主要检查是得到的＆＃34;数组＆＃34;大于一个项目。如果不是，则将内容替换为单个false值。其原因即将变得明显。

在此之后＆＃34;第一步＆＃34;真正的重复循环生物，然后该阵列被去标准化＆＃34;使用$unwind，然后$project制作，以便匹配＆＃34;最后一次见过＆＃34;。

的文件

由于只有一个文件符合这一条件，因此结果再次排序＆＃34;排序＆＃34;为了漂浮＆＃34;看不见的＆＃34;文档到顶部，当然保持分组顺序。接下来的事情类似于第一个$group步骤，但保持任何保持位置并且首先看不到＆＃34;文件被＃34;弹出堆栈＆＃34;试。

＆＃34;见过的文件＆＃34;然后被推回到数组而不是作为自身，而是作为false的值。这不会与保留的值相匹配，这通常是处理这种情况的方式而不是“破坏性的”＃34;如果没有足够的匹配来覆盖所需的n结果，那么您不希望操作失败的数组内容。

完成后清理，下一次＆＃34;预测＆＃34;将数组添加到现在按＆＃34;类型＆＃34;分组的最终文档中。代表所需n个结果中的每个位置。展开此数组时，文档可以再次组合在一起，但现在所有文档都在一个数组中可能包含多个false值，但n个元素长。

最后再次展开数组，使用$match过滤掉false值，并投影到所需的文档表单。

实用性

前面提到的问题是过滤结果的数量，因为可以推送到数组中的结果数量存在实际限制。这主要是BSON限制，但即使仍处于限制范围内，您也不会真正想要1000件物品。

这里的诀窍是保持最初的＃34;匹配＆＃34;足够小的切片操作＆＃34;变得实用。 $geoNear管道流程中有一些东西可以使这成为可能。

显而易见的是limit。默认情况下，这是100，但您显然希望获得以下范围内的内容：

（您可能匹配的类别数量）X（必填项）

但如果这实际上是一个不在1000年代的数字，那么这里已经有了一些帮助。

其他人是maxDistance和minDistance，其中基本上你将上下限放在＆＃34;远离＆＃34;寻找。最大界限是一般限制器，而最小界限在＆＃34; paging＆＃34;时是有用的，这是下一个帮手。

当＆＃34;向上分页＆＃34;时，您可以使用query参数来排除文档的_id值＆＃34;已经看过＆＃34;使用$nin查询。以同样的方式，minDistance可以填充＆＃34;最后一次见到＆＃34;最大距离，或至少最小的最大距离＆＃34;类型＆＃34;。这允许一些概念来过滤已经被＆＃34;看到＆＃34;并获得另一页。

这本身就是一个话题，但是为了使这个过程切合实际，这些是减少初始匹配的一般事项。

实施

返回＆＃34; 10的一般问题最多，每种类型＆＃34;显然需要一些代码才能生成流水线阶段。没有人愿意输入，实际上你可能想在某个时候改变这个数字。

现在到了可以生成怪物管道的代码。 JavaScript中的所有代码，但原则上易于翻译：

var coords = [-118.09771, 33.89244];

var key = "$type";
var val = {
    "_id": "$_id",
    "title": "$title",
    "author": "$author",
    "distance": "$distance"
};
var maxLen = 10;

var stack = [];
var pipe = [];
var fproj = { "$project": { "pos": { "$literal": []  } } };

pipe.push({ "$geoNear": {
    "near": coords, 
    "distanceField": "distance", 
    "spherical": true
}});

pipe.push({ "$sort": {
    "type": 1, "distance": 1
}});

for ( var x = 1; x <= maxLen; x++ ) {

    fproj["$project"][""+x] = 1;
    fproj["$project"]["pos"]["$literal"].push( x );

    var rec = {
        "$cond": [ { "$eq": [ "$pos", x ] }, "$"+x ]
    };
    if ( stack.length == 0 ) {
        rec["$cond"].push( false );
    } else {
        lval = stack.pop();
        rec["$cond"].push( lval );
    }

    stack.push( rec );

    if ( x == 1) {
        pipe.push({ "$group": {
           "_id": key,
           "1": { "$first": val },
           "books": { "$push": val }
        }});
        pipe.push({ "$project": {
           "1": 1,
           "books": {
               "$cond": [
                    { "$eq": [ { "$size": "$books" }, 1 ] },
                    { "$literal": [false] },
                    "$books"
               ]
           }
        }});
    } else {
        pipe.push({ "$unwind": "$books" });
        var proj = {
            "$project": {
                "books": 1
            }
        };

        proj["$project"]["seen"] = { "$eq": [ "$"+(x-1), "$books" ] };

        var grp = {
            "$group": {
                "_id": "$_id",
                "books": {
                    "$push": {
                        "$cond": [ { "$not": "$seen" }, "$books", false ]
                    }
                }
            }
        };

        for ( n=x; n >= 1; n-- ) {
            if ( n != x ) 
                proj["$project"][""+n] = 1;
            grp["$group"][""+n] = ( n == x ) ? { "$first": "$books" } : { "$first": "$"+n };
        }

        pipe.push( proj );
        pipe.push({ "$sort": { "_id": 1, "seen": 1 } });
        pipe.push(grp);
    }
}

pipe.push(fproj);
pipe.push({ "$unwind": "$pos" });
pipe.push({
    "$group": {
        "_id": "$_id",
        "msgs": { "$push": stack[0] }
    }
});
pipe.push({ "$unwind": "$books" });
pipe.push({ "$match": { "books": { "$ne": false } }});
pipe.push({
    "$project": {
        "_id": "$books._id",
        "title": "$books.title",
        "author": "$books.author",
        "type": "$_id",
        "distance": "$books",
        "sortOrder": {
            "$add": [
                { "$cond": [ { "$eq": [ "$_id", "Fiction" ] }, 1, 0 ] },
                { "$cond": [ { "$eq": [ "$_id", "Science" ] }, 0, 0 ] },
                { "$cond": [ { "$eq": [ "$_id", "Horror" ] }, 3, 0 ] },
            ]
        }
    }
});
pipe.push({ "$sort": { "sortOrder": 1, "distance": 1 } });

替代

当然，这里的最终结果以及上述所有问题的一般问题是你真的只想要＆＃34;前10＆＃34;每个＆＃34;类型＆＃34;回来。聚合管道将会这样做，但代价是保持10以上，然后从堆栈弹出＃34;到达10点。

另一种方法是蛮力＆＃34;这与mapReduce和＆＃34;全局范围＆＃34;变量。不是很好，因为结果全部在数组中，但它可能是一个实用的方法：

db.collection.mapReduce(
    function () {

        if ( !stash.hasOwnProperty(this.type) ) {
            stash[this.type] = [];
        }

        if ( stash[this.type.length < maxLen ) {
            stash[this.type].push({
                "title": this.title,
                "author": this.author,
                "type": this.type,
                "distance": this.distance
            });
            emit( this.type, 1 );
        }

    },
    function(key,values) {
        return 1;   // really just want to keep the keys
    },
    { 
        "query": {
            "location": {
                "$nearSphere": [-118.09771, 33.89244]
            }
        },
        "scope": { "stash": {}, "maxLen": 10 },
        "finalize": function(key,value) {
            return { "msgs": stash[key] };                
        },
        "out": { "inline": 1 }
    }
)

这是一个真正的作弊，只使用＆＃34;全球范围＆＃34;保持其键为分组键的单个对象。将结果推送到该全局对象中的数组上，直到达到最大长度。结果已按最近的顺序排序，因此映射器只是在每个键达到10之后放弃对当前文档执行任何操作。

由于每个键只发出一个文档，因此不会调用reducer。最终确定然后＆＃34;拉出＆＃34;来自全局的值并将其返回到结果中。

很简单，但是如果你真的需要它们，你当然不会拥有所有$geoNear个选项，而且这个表单的硬限制为100个文档作为初始查询的输出。

返回某种类型的有限数量的记录，但其他记录的数量不限？

4 个答案:

问题

原理

实用性

实施

替代