Mongo游标循环缓慢

时间:2013-06-08 19:43:33

标签: r mongodb rmongodb

对于为rev_timestamp编制索引的表,我有以下过程,并且有大约2亿条记录。它的工作速度非常快,但在循环过程中它每隔约10秒钟暂停10-50秒。那是为什么?

buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "rev_timestamp")
mongo.bson.buffer.append(buf, "$lte", 20060201000000)
mongo.bson.buffer.append(buf, "$gte", 20060101000000)
mongo.bson.buffer.finish.object(buf)
query <- mongo.bson.from.buffer(buf)
ns = "enwiki.revision"
no <- mongo.count(mongo, ns, query)
cursor <- mongo.find(mongo, ns, query,,list(rev_user=1L,rev_user_text=1L))
# Convert to table
rev_user <- vector("integer", no)
rev_user_text <- vector("character", no)
i <- 1
while (mongo.cursor.next(cursor)) {
  b <- mongo.cursor.value(cursor)
  rev_user[i] <- mongo.bson.value(b, "rev_user")
  rev_user_text[i] <- mongo.bson.value(b, "rev_user_text")
  i <- i + 1
  cat(i,"\n")
}
totalusers <- as.data.frame(list(user=rev_user, user_text=rev_user_text))

日志显示查找完成:

Sat Jun  8 23:26:32.471 [initandlisten] connection accepted from 127.0.0.1:37097 #65 (3 connections now open)
Sat Jun  8 23:26:32.821 [conn65] command enwiki.$cmd command: { count: "revision", query: { rev_timestamp: { $lte: 20060201000000.0, $gte: 20060101000000.0 } } } ntoreturn:1 keyUpdates:0 numYields: 6 locks(micros) r:690301 reslen:48 348ms
Sat Jun  8 23:34:59.163 [conn69] getmore enwiki.revision query: { rev_timestamp: { $lte: 20060201000000.0, $gte: 20060101000000.0 } } cursorid:166545066544259763 ntoreturn:0 keyUpdates:0 numYields: 5 locks(micros) r:565614 nreturned:63397 reslen:4194315 325ms
Sat Jun  8 23:35:25.209 [conn65] getmore enwiki.revision query: { rev_timestamp: { $lte: 20060201000000.0, $gte: 20060101000000.0 } } cursorid:164394032580746079 ntoreturn:0 keyUpdates:0 numYields: 17411 locks(micros) r:88392121 nreturned:63496 reslen:4194290 261829ms
Sat Jun  8 23:35:25.209 [conn69] getmore enwiki.revision query: { rev_timestamp: { $lte: 20060201000000.0, $gte: 20060101000000.0 } } cursorid:166545066544259763 ntoreturn:0 keyUpdates:0 numYields: 824 locks(micros) r:23376882 nreturned:63496 reslen:4194290 20979ms
Sat Jun  8 23:35:25.210 [conn65] SocketException handling request, closing client connection: 9001 socket exception [2] server [127.0.0.1:37097] 
Sat Jun  8 23:36:01.980 [initandlisten] connection accepted from 127.0.0.1:39182 #70 (2 connections now open)
Sat Jun  8 23:36:50.724 [conn70] end connection 127.0.0.1:39182 (1 connection now open)

然而,脚本卡在第102项并且进度显示在服务器上:

> db.currentOp()
{
    "inprog" : [
        {
            "opid" : 221329005,
            "active" : true,
            "secs_running" : 156,
            "op" : "getmore",
            "ns" : "enwiki.revision",
            "query" : {
                "rev_timestamp" : {
                    "$lte" : 20060201000000,
                    "$gte" : 20060101000000
                }
            },
            "client" : "127.0.0.1:37097",
            "desc" : "conn65",
            "threadId" : "0x7f63e219d700",
            "connectionId" : 65,
            "waitingForLock" : false,
            "numYields" : 7268,
            "lockStats" : {
                "timeLockedMicros" : {
                    "r" : NumberLong(105797470),
                    "w" : NumberLong(0)
                },
                "timeAcquiringMicros" : {
                    "r" : NumberLong(156781253),
                    "w" : NumberLong(0)
                }
            }
        }
    ]
}
>

索引:

> db.revision.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "enwiki.revision",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "rev_timestamp" : 1
        },
        "ns" : "enwiki.revision",
        "name" : "rev_timestamp_1"
    },
    {
        "v" : 1,
        "key" : {
            "rev_timestamp" : 1,
            "rev_user" : 1
        },
        "ns" : "enwiki.revision",
        "name" : "rev_timestamp_1_rev_user_1"
    }
]
> 

这是正常行为吗?据我所知,密钥检索应该快速使用光标。最初的发现应该只需要很长时间。

insert  query update delete getmore command flushes mapped  vsize    res faults     locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     92   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:06 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     58   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:07 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     65   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:08 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     63   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:10 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     84   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:11 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     93   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:12 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     84   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:13 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     64   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:14 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     85   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:15 
    *0     *0     *0     *0       0     1|0       0    86g   172g  3.76g     90   enwiki:0.0%          0       0|0     1|0    62b     2k     2   00:20:16 

0 个答案:

没有答案