从用户时间线获取转推计数

时间:2016-04-08 09:33:04

标签: javascript mongodb mapreduce mongodb-query aggregation-framework

我使用twitter api 'statuses / user_timeline'收集了我自己存储在mongodb中的推文推文。我试图获得转推计数我已经使用 MongoDb MapReduce 方法发布了推文,但却无法获得。任何人都可以帮助我。

示例数据:这是存储在mongodb中的文档格式

{
    "_id" : ObjectId("570664d7a9c29761168b4587"),
    "created_at" : "Thu Sep 17 01:17:28 +0000 2015",
    "id" : NumberLong("644319222886039556"),
    "id_str" : "644319222886039556",
    "text" : "Be silent or let your words be worth more than you silence.",
    "entities" : {
        "hashtags" : [ ],
        "symbols" : [ ],
        "user_mentions" : [ ],
        "urls" : [ ]
    },
    "truncated" : false,
    "source" : "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
    "in_reply_to_status_id" : null,
    "in_reply_to_status_id_str" : null,
    "in_reply_to_user_id" : null,
    "in_reply_to_user_id_str" : null,
    "in_reply_to_screen_name" : null,
    "user" : {
        // Here is the user information who tweeted
        "id" : NumberLong(xxxxxxxxxxxxxxxxx),
        "id_str" : "xxxxxxxxx",
        "name" : "Haridarshan Gorana",
        "screen_name" : "haridarshan2901"
    },
    "geo" : null,
    "coordinates" : null,
    "place" : null,
    "contributors" : null,
    "is_quote_status" : false,
    "retweet_count" : NumberLong(1),
    "favorite_count" : NumberLong(0),
    "favorited" : false,
    "retweeted" : false,
    "lang" : "en"
}

代码:

$map = new \MongoCode("function() { emit(this.id_str, this.retweet_count); }");
$out = "retweets";
$reduce = new \MongoCode('function(key, values) {
    var retweets = 0;
    for(i=0;i<values.length;i++){

        if( values[i].retweet_count > 0 ){
            retweets += values[i].retweet_count;
        }

    }
    return retweets;
}');
$verbose = true;
$cmd = array(
    "map" => $map,
    "reduce" => $reduce,
    "query" => $query,
    "out" => "retweets",
    "verbose" => true
);

$result = $db->command($cmd);

print_r($result);

这给了我这个错误

致命错误:在null

上调用成员函数command()

我尝试在mongo客户端上运行相同的代码

var mapFunction1 = function() {
    emit(this.id_str, this.retweet_count);
}

var reduceFunction1 = function(id, values) { 
    var retweet = 0; 
    for(i=0;i<values.length;i++){ 
        if(values[i].retweet_count > 0) { 
            retweet += values[i].retweet_count;
        } 
    } 
    return retweet;  
}

db.tweets.mapReduce(
    mapFunction1, 
    reduceFunction1, 
    {
        query: { 
            user: { id: xxxxxxxxx }
        }, 
        out: "retweets", 
        verbose: true
    }
)

从控制台输出

{
    "result" : "retweets",
    "timeMillis" : 12,
    "timing" : {
        "mapTime" : 0,
        "emitLoop" : 8,
        "reduceTime" : 0,
        "mode" : "mixed",
        "total" : 12
    },
    "counts" : {
        "input" : 0,
        "emit" : 0,
        "reduce" : 0,
        "output" : 0
    },
    "ok" : 1
}

1 个答案:

答案 0 :(得分:3)

你的减速器试图调用一个属性retweet_count当所有那里只有一个&#34;值&#34;没有其他财产。您已经在映射器中引用了它。

实际上你的减少只能是:

function(key,values) {
    return Array.sum(values)
}

但是你最好只使用.aggregate()。它不仅更简单,而且运行速度更快:

db.tweets.aggregate([
  { "$group": {
    "_id": "$user.id_str",
    "retweets": { "$sum": "$retweet_count" }
  }}
])

或者PHP

$collection->aggregate(
    array(
        '$group' => array(
           '_id' => '$user.id_str',
           'retweets' => array( '$sum' => '$retweet_count' )
        )
    )
)

如果你想添加&#34;查询&#34;然后在开始时添加一个$match管道阶段。即。

$collection->aggregate(
    array(
        '$match' => array(
            'user.id_str' => 'xxxxxxxxx'
        )
    ),    
    array(
        '$group' => array(
           '_id' => '$user.id_str',
           'retweets' => array( '$sum' => '$retweet_count' )
        )
    )
)

当结构实际需要 JavaScript控件进行处理时,您应该只使用mapReduce