将MySQL查询转换为mongoDB

时间:2014-05-03 08:58:43

标签: mysql mongodb mapreduce aggregation-framework

我已经开始学习MongoDB并遇到问题。我有一个集合名称为server_logs。

它包含以下列(SOURCE_SERVER,SOURCE_PORT,DESTINATION_PORT,DESTINATION_SERVER,MBYTES)。

我需要SOURCE_SERVER,MBYTES的总金额转移到每个SOURCE_SERVER。(但是还有一点是,如果在target_server中也存在任何source_server,那么他们的MBYTES也会在每个SOURCE_SERVER中添加)。

例如:我有以下表结构

  SOURCE   S_PORT   DEST    D_PORT  MBYTES
1)server1   446    server2   555     10MB
2)server3   226    server1   666     2MB
3)server1   446    server3   226     5MB

我需要以下结果:

Server1  17MB
Server3  7MB

我在mysql中创建了一个查询,根据传输到该SOURCE的数据的MBYTES来计算最高SOURCE。它工作正常,我通过此查询获得MYSQL所需的结果。

SELECT SOURCE, DEST, sum( logs.MBYTES )+(
    SELECT SUM(log.MBYTES) as sum
    from logs as log
    where logs.DEST=log.SOURCE
) AS MBYTES

我想在MongoDB中使用此查询。请帮忙..

提前致谢..

2 个答案:

答案 0 :(得分:1)

虽然这种“自联接”类型的查询对于如何使用MongoDB看起来似乎并不明显,但可以使用聚合框架来完成,但只需要稍微改变一下你的想法。

使用这种形式的MongoDB中的数据,这仍然非常像原始的SQL源:

{ 
    "source" : "server1",
    "s_port" : 446,
    "dest" : "server2", 
    "d_port" : 555, 
    "transferMB" : 10
},
{ 
    "source" : "server3",
    "s_port" : 226,
    "dest" : "server1",
    "d_port" : 666,
    "transferMB" : 2
},
{ 
    "source" : "server1",
    "s_port" : 446, 
    "dest" : "server3",
    "d_port" : 226,
    "transferMB" : 5
}

使用2.6版本的MongoDB,您的查询将如下所示:

db.logs.aggregate([

    // Project a "type" tag in order to transform, then unwind
    { "$project": {
         "source": 1,
         "dest": 1,
         "transferMB": 1,
         "type": { "$cond": [ 1,[ "source", "dest" ],0] }
    }},
    { "$unwind": "$type" },

    // Map the "source" and "dest" servers onto the type, keep the source       
    { "$project": {
        "type": 1,
        "tag": { "$cond": [
            { "$eq": [ "$type", "source" ] },
            "$source",
            "$dest"
        ]},
        "mbytes": "$transferMB",
        "source": 1
    }},

    // Group for totals, keep an array of the "source" for each
    { "$group": {
        "_id": "$tag",
        "mbytes": { "$sum": "$mbytes" },
        "source": { "$addToSet": "$source" }
    }},


    // Unwind that array
    { "$unwind": "$source" },

    // Is our grouped tag one on the sources? Inner join simulate
    { "$project": {
        "mbytes": 1,
        "matched": { "$eq": [ "$source", "$_id" ] }
    }},

    // Filter the results that did not match
    { "$match": { "matched": true }},


    // Discard duplicates for each server tag
    { "$group": { 
        "_id": "$_id",
        "mbytes": { "$first": "$mbytes" }
    }}
])

对于2.6及更高版本,您可以使用一些额外的运算符来简化此操作,或者至少使用不同的运算符:

db.logs.aggregate([

    // Project a "type" tag in order to transform, then unwind
    { "$project": {
         "source": 1,
         "dest": 1,
         "transferMB": 1,
         "type": { "$literal": [ "source", "dest" ] }
    }},
    { "$unwind": "$type" },

    // Map the "source" and "dest" servers onto the type, keep the source       
    { "$project": {
        "type": 1,
        "tag": { "$cond": [
            { "$eq": [ "$type", "source" ] },
            "$source",
            "$dest"
        ]},
        "mbytes": "$transferMB",
        "source": 1
    }},

    // Group for totals, keep an array of the "source" for each
    { "$group": {
        "_id": "$tag",
        "mbytes": { "$sum": "$mbytes" },
        "source": { "$addToSet": "$source" }
    }},

    // Co-erce the server tag into an array ( of one element )
    { "$group": {
        "_id": "$_id",
        "mbytes": { "$first": "$mbytes" },
        "source": { "$first": "$source" },
        "tags": { "$push": "$_id" }
    }},

    // User set intersection to find common element count of arrays
    { "$project": {
       "mbytes": 1,
       "matched": { "$size": { 
           "$setIntersection": [
               "$source",
               "$tags"
           ]
       }}
    }},

    // Filter those that had nothing in common
    { "$match": { "matched": { "$gt": 0 } }},

    // Remove the un-required field
    { "$project": { "mbytes": 1 }}
])

两种形式都会产生结果:

{ "_id" : "server1", "mbytes" : 17 }
{ "_id" : "server3", "mbytes" : 7 }

两者的一般原则是,通过保留有效“源”服务器的列表,您可以“过滤”组合结果,以便只有那些列为源的记录将记录其总传输。

因此,您可以使用一些技术来“重新塑造”,“合并”和“过滤”您的文档以获得所需的结果。

aggregation operators上阅读更多信息,同时值得一看的是文档中的SQL to Aggregation mapping chart,以便您了解转换常见操作的信息。

甚至可以在Stack Overflow上浏览标签,以找到一些有趣的转换操作。

答案 1 :(得分:0)

您可以使用聚合框架:

db.logs.aggregate([
    {$group:{_id:"$SOURCE",MBYTES:{$sum:"$MBYTES"}}}
])

假设您在MBYTES字段中只有numer值。因此,您将拥有:

{
    _id: server1,
    MBYTES: 17
},
{
    _id: server3,
    MBYTES: 7
}

如果你必须计算这个也是服务器出现在DEST字段你应该使用map-reduce方法:

var mapF = function(){
    emit(this.SOURCE,this.MBYTES);
    emit(this.DEST,this.MBYTES);
}

var reduceF = function(serverId,mbytesValues){
    var reduced = {
        server: serverId,
        mbytes: 0
    };

    mbytesValues.forEach(function(value) {
        reduced.mbytes += value;
    });

    return reduced;
}

db.logs.mapReduce(mapF,reduceF,{out:"server_stats"});

之后您可以在server_stats集合中找到结果。