MongoDB聚合/展开/组/项目查询组合

时间:2013-05-23 13:42:26

标签: ruby mongodb aggregation-framework

我有以下格式的集合中的记录:

"_id" : "2013-05-23",
    "authors_who_sold_books" : [
        {
            "id" : "Charles Dickens",
            "num_sold" : 1,
            "customers" : [
                {
                   "time_bought" : 1368627290,
                   "customer_id" : 9715923
                }
            ]
        },
        {
            "id" : "JRR Tolkien",
            "num_sold" : 2,
            "customers" : [
                {
                    "date_bought" : 1368540890,
                    "customer_id" : 9872345
                },
                {
                    "date_bought" : 1368537290,
                    "customer_id" : 9163893
                }
            ]
        }
    ]
}

每个日期都有一条记录,其中许多日期将包含同一作者。我在返回以下内容的查询之后:

{
    "_id" : "Charles Dickens",
    "num_sold" : 235,
    "customers" : [
        {
            "date_bought" : 1368627290,
            "customer_id" : 9715923
        },
        {
            "date_bought" : 1368622358,
            "customer_id" : 9876234
        },
        etc...
    ]
}

我已尝试过聚合,群组,展开和项目的各种组合,但仍然无法实现这一目标,并且非常感谢任何建议。

对于额外的分数,我实际上是使用Ruby gem做的,所以特定于此的代码会很棒。但是,我可以转换普通的MongoDB查询语言。

1 个答案:

答案 0 :(得分:8)

我获取了您的样本数据,略微修改了第二个文档,然后将它们添加到测试集合中。我使用的文件如下:

{
    "_id" : "2013-05-23",
    "authors_who_sold_books" : [
        {
            "id" : "Charles Dickens",
            "num_sold" : 1,
            "customers" : [
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                }
            ]
        },
        {
            "id" : "JRR Tolkien",
            "num_sold" : 2,
            "customers" : [
                {
                    "date_bought" : 1368540890,
                    "customer_id" : 9872345
                },
                {
                    "date_bought" : 1368537290,
                    "customer_id" : 9163893
                }
            ]
        }
    ]
}
{
    "_id" : "2013-05-21",
    "authors_who_sold_books" : [
        {
            "id" : "Charles Dickens",
            "num_sold" : 3,
            "customers" : [
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                },
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                },
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                }
            ]
        },
        {
            "id" : "JRR Tolkien",
            "num_sold" : 1,
            "customers" : [
                {
                    "date_bought" : 1368540890,
                    "customer_id" : 9872345
                }
            ]
        }
    ]
}

现在,为了获得预期的结果,我使用了聚合框架并运行了这个查询:

db.collection.aggregate([
    {
        // First we unwind all the authors that sold books
        $unwind: '$authors_who_sold_books',
    },
    {
        // Next, we unwind each of the customers that purchased a book
        $unwind: '$authors_who_sold_books.customers'
    },
    {
        // Now we group them by "Author Name" (hoping they are unique!)
        $group: {
            _id: '$authors_who_sold_books.id',
            // Increment the number sold by each author
            num_sold: {
                $sum: 1
            },
            // Add the customer data to the array
            customers: {
                $push: '$authors_who_sold_books.customers'
            }
        }
    }
]);

我试着记录上面的代码,这样就更有意义了。基本上,它将数据展开两次,以便为作者的每次销售创建文档。首先按authors_who_sold_books展开,然后展开authors_who_sold_books.customers

下一步是将它们分组并将所有客户推送到customers数组中,并为我们拥有的每个未提供的文档将num_sold递增1。

结果看起来像:

{
    "result" : [
        {
            "_id" : "JRR Tolkien",
            "num_sold" : 3,
            "customers" : [
                {
                    "date_bought" : 1368540890,
                    "customer_id" : 9872345
                },
                {
                    "date_bought" : 1368537290,
                    "customer_id" : 9163893
                },
                {
                    "date_bought" : 1368540890,
                    "customer_id" : 9872345
                }
            ]
        },
        {
            "_id" : "Charles Dickens",
            "num_sold" : 4,
            "customers" : [
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                },
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                },
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                },
                {
                    "time_bought" : 1368627290,
                    "customer_id" : 9715923
                }
            ]
        }
    ],
    "ok" : 1
}

希望这可以帮助您找出真正的解决方案:)