Question

我有一个MongoDB集合（名为＆＃39; links＆＃39;），其中包含以下文档：

{
    "_id" : ObjectId("544bc8abd4c66b0e3cf12665"),
    "name" : "Pet 4056 AgR",
    "file" : "P0001J01",
    "quotes" : [
        {
            "_id" : ObjectId("544bc8afd4c66b0e3cf15173"),
            "name" : "Pet 4837 ED",
            "file" : "P1103J03"
        },
        {
            "_id" : ObjectId("544bc8b6d4c66b0e3cf19425"),
            "name" : "ACO 845 AgR",
            "file" : "P2810J07"
        },
        {
            "_id" : ObjectId("544bc8afd4c66b0e3cf14a77"),
            "name" : "ACO 1574 AgR",
            "file" : "P0924J05"
        }
    ]
}

在我的数据库中，这意味着本文档引用了其他3个文档。对于每个文档，在其quotes数组中，没有两个具有相同id / name /文件的文档。 name字段在集合中是唯一的。

现在，我需要获取引用最多的文档。它是大多数quotes数组中出现的文档。我怎样才能做到这一点？我相信这是通过聚合实现的，但我无法弄清楚如何做到这一点，特别是因为名称在数组中。

谢谢！：）

Answer 1

您可以使用聚合框架执行此操作，但使用数组的一个关键功能是您使用$unwind管道操作来首先＆＃34;反规范化＆＃34;数组内容作为单独的文档：

db.links.aggregate([
    // Unwind the array
    { "$unwind": "$quotes" },

    // Group by the inner "name" value and count the occurrences
    { "$group": {
        "_id": "$quotes.name",
        "count": { "$sum": 1 }
    }},

    // Sort to the highest count on top
    { "$sort": { "count": 1 } },

    // Just return the largest value
    { "$limit": 1 }

])

那么$unwind这里对每个数组元素的作用是＆＃34;外部＆＃34;拥有数组的文档，并生成一个包含外部和单个数组元素的新文档。基本上是这样的：

{
    "_id" : ObjectId("544bc8abd4c66b0e3cf12665"),
    "name" : "Pet 4056 AgR",
    "file" : "P0001J01",
    "quotes" : 
        {
            "_id" : ObjectId("544bc8afd4c66b0e3cf15173"),
            "name" : "Pet 4837 ED",
            "file" : "P1103J03"
        }
},
{
    "_id" : ObjectId("544bc8abd4c66b0e3cf12665"),
    "name" : "Pet 4056 AgR",
    "file" : "P0001J01",
    "quotes" : 
        {
            "_id" : ObjectId("544bc8b6d4c66b0e3cf19425"),
            "name" : "ACO 845 AgR",
            "file" : "P2810J07"
        }
}

这允许其他聚合管道阶段像访问任何普通文档一样访问内容，因此您可以{＆＃34; quotes.name＆＃34; $group出现。没问题。

仔细看看所有aggregation pipeline operators，值得了解他们所做的一切。

MongoDB：检索大多数引用的文档

1 个答案: