使用聚合框架

Question

我需要mongodb的自定义查询构建器。我已经完成了可用于查询的文档列表（字段）的用户界面。用户可以选择“结果列”，“条件”，“分组依据”和“排序依据”。让我解释一下使用SQL语言..参见示例：

SELECT col1, col2 FROM table WHERE col1=1 AND col2="foo" OR col3 > "2012-01-01 00:00:00" OR col3 < "2012-01-02 00:00:00" AND col5 IN (100, 101, 102) GROUP BY col4, col5 ORDER BY col1 DESC, col2 ASC

所以

SELECT col1，col2 - 结果列
WHERE col1 = 1 AND col2 =“foo”OR col3＆gt; “2012-01-01 00:00:00”或col3＆lt; “2012-01-02 00:00:00” - 条件
GROUP BY col4，col5 - group by statement
ORDER BY col1 DESC，col2 ASC - 按语句排序

列计数，条件，分组依据和顺序依据Python应基于用户界面使用的JSON数据生成。

我只是好奇是否可以使用MapReduce为mongoDB做什么？你能看到任何模块吗？另外，如果你对MongoDB很好，请将这个SQL查询翻译成MongoDB查询吗？

Answer 1

最简单（也是最具扩展性）的解决方案可能是将过滤条件转换为MongoDB查询，并在客户端进行聚合。

按照上面的示例，让我们将其分解并构建一个MongoDB查询（我将使用PyMongo显示此信息，但如果您愿意，可以使用Mongoengine或其他ODM执行此操作）：

WHERE col1 = 1 AND col2 =“foo”OR col3＆gt; “2012-01-01 00:00:00”或col3＆lt; “2012-01-02 00:00:00” - 条件

这是PyMongo的find()方法的第一个参数。我们必须使用$or运算符显式构建逻辑AND / OR树：

from bson.tz_util import utc
cursor = db.collection.find({'$or': [
    {'col1': 1, 'col2': 'foo'},
    {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}},
    {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}},
]})

请注意，MongoDB在与日期/时间字段进行比较时不会将字符串转换为日期，所以我在这里使用Python datetime模块明确地这样做了。该模块中的datetime类假定0为非指定参数的默认值。

SELECT col1，col2 - 结果列

我们可以使用field selection来检索我们想要的字段：

from bson.tz_util import utc
cursor = db.collection.find({'$or': [
    {'col1': 1, 'col2': 'foo'},
    {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}},
    {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}},
]}, fields=['col1', 'col2'])

GROUP BY col4，col5 - 按语句分组

使用标准的MongoDB查询无法有效地完成这项工作（尽管我将在稍后展示如何在服务器端使用新的Aggregation Framework来完成此操作）。相反，知道我们想要按这些列进行分组，我们可以通过按以下字段排序来使应用程序代码更简单：

from bson.tz_util import utc
from pymongo import ASCENDING
cursor = db.collection.find({'$or': [
    {'col1': 1, 'col2': 'foo'},
    {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}},
    {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}},
]}, fields=['col1', 'col2', 'col4', 'col5'])
cursor.sort([('col4', ASCENDING), ('col5', ASCENDING)])

ORDER BY col1 DESC，col2 ASC - 按语句排序

这应该在应用你想要的聚合函数后在你的应用程序代码中完成（假设我们想要对col4求和，并取col5的最大值）：

from bson.tz_util import utc
from pymongo import ASCENDING
cursor = db.collection.find({'$or': [
    {'col1': 1, 'col2': 'foo'},
    {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}},
    {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}},
]}, fields=['col1', 'col2', 'col4', 'col5'])
cursor.sort([('col4', ASCENDING), ('col5', ASCENDING)])

# groupby REQUIRES that the iterable be sorted to work 
# correctly; we've asked Mongo to do this, so we don't
# need to do so explicitly here.
from itertools import groupby
groups = groupby(cursor, keyfunc=lambda doc: (doc['col1'], doc['col2'])
out = []
for (col1, col2), docs in groups:
    col4sum = 0
    col5max = float('-inf')
    for doc in docs:
        col4sum += doc['col4']
        col5max = max(col5max, doc['col5'])
    out.append({
        'col1': col1,
        'col2': col2,
        'col4sum': col4sum,
        'col5max': col5max
    })

使用聚合框架

如果您使用的是MongoDB 2.1或更高版本（2.1.x是预计很快会发布到2.2.0稳定版本的开发系列），您可以使用聚合框架在服务器端执行所有这些操作。为此，请使用aggregate命令：

from bson.son import SON
from pymongo import ASCENDING, DESCENDING
group_key = SON([('col4', '$col4'), ('col5': '$col5')])
sort_key = SON([('$col1', DESCENDING), ('$col2', ASCENDING)])
db.command('aggregate', 'collection_name', pipeline=[
    # this is like the WHERE clause
    {'$match': {'$or': [
        {'col1': 1, 'col2': 'foo'},
        {'col3': {'$gt': datetime(2012, 01, 01, tzinfo=utc)}},
        {'col3': {'$lt': datetime(2012, 01, 02, tzinfo=utc)}},
        ]}},
    # SELECT sum(col4), max(col5) ... GROUP BY col4, col5
    {'$group': {
        '_id': group_key,
        'col4sum': {'$sum': '$col4'},
        'col5max': {'$max': '$col5'}}},
    # ORDER BY col1 DESC, col2 ASC
    {'$sort': sort_key}
])

aggregate命令返回BSON文档（即Python字典），该文档受MongoDB的通常限制：如果要返回的文档大小超过16MB，它将失败。此外，对于内存中的排序（如此聚合结束时$sort所要求的那样），如果排序需要服务器上超过10％的物理RAM，则聚合框架将失败（这是防止昂贵的聚合驱逐Mongo用于数据文件的所有内存。）

Answer 2

你的问题是什么。当然你可以对Mongo做这些查询，而mapreduce与之无关。如果你想快速启动Mongo，你可以尝试ORM，如mongoengine

使用Python从用户构建基于JSON的mongoDB查询

2 个答案:

使用聚合框架