MongoDB - Sharding migrateThread随机运行超过12小时

时间:2017-05-04 19:39:34

标签: mongodb

12小时前我从LibreNMS监控工具收到一条通知,告知我的12台MongoDB(版本3.2.11)服务器之一的mongo守护程序出现问题(连接时间超过10秒)。我决定忽略它并等待它,我只是假设它有点忙。

几个小时后,当我跑db.currentOp()时,我有点担心。我看到有一个正在运行migrateThread的操作,其中包含消息"第2步,共5步"以及带有消息的几个插入"查询不记录(太大)"。

在做了一些互联网搜索后,我发现它可能需要一些,因为它正在将数据块迁移到其他服务器。所以我决定等一下,因为我不想打断它,最终导致生产实例上的2 TB数据被破坏。

现在已经过了12个小时,我开始担心会发生什么。它仍处于5"的第2步,处理器负载非常高,但它似乎仍在移动块并产生新的migrateThread操作以及许多"查询不记录(也是大)"插入

这是我currentOp()日志的一部分:

        {
        "desc" : "migrateThread",
        "threadId" : "139962853246720",
        "active" : true,
        "opid" : -2003494368,
        "secs_running" : 408,
        "microsecs_running" : NumberLong(408914923),
        "op" : "none",
        "ns" : "data.logs",
        "query" : {

        },
        "msg" : "step 2 of 5",
        "numYields" : 0,
        "locks" : {
            "Global" : "w",
            "Database" : "w",
            "Collection" : "w"
        },
        "waitingForLock" : false,
        "lockStats" : {
            "Global" : {
                "acquireCount" : {
                    "r" : NumberLong(37984),
                    "w" : NumberLong(37982)
                }
            },
            "Database" : {
                "acquireCount" : {
                    "r" : NumberLong(1),
                    "w" : NumberLong(37981),
                    "W" : NumberLong(1)
                },
                "acquireWaitCount" : {
                    "W" : NumberLong(1)
                },
                "timeAcquiringMicros" : {
                    "W" : NumberLong(1446)
                }
            },
            "Collection" : {
                "acquireCount" : {
                    "r" : NumberLong(1),
                    "w" : NumberLong(37980),
                    "W" : NumberLong(1)
                },
                "acquireWaitCount" : {
                    "W" : NumberLong(1)
                },
                "timeAcquiringMicros" : {
                    "W" : NumberLong(3224)
                }
            }
        }
    },
    {
        "desc" : "conn451221",
        "threadId" : "139962959451904",
        "connectionId" : 451221,
        "client" : "10.0.0.111:57408",
        "active" : true,
        "opid" : -2003439364,
        "secs_running" : 0,
        "microsecs_running" : NumberLong(37333),
        "op" : "insert",
        "ns" : "data.logs",
        "query" : {
            "$msg" : "query not recording (too large)"
        },
        "numYields" : 0,
        "locks" : {
            "Global" : "w",
            "Database" : "w",
            "Collection" : "w"
        },
        "waitingForLock" : false,
        "lockStats" : {
            "Global" : {
                "acquireCount" : {
                    "r" : NumberLong(1),
                    "w" : NumberLong(1)
                }
            },
            "Database" : {
                "acquireCount" : {
                    "w" : NumberLong(1)
                }
            },
            "Collection" : {
                "acquireCount" : {
                    "w" : NumberLong(1)
                }
            }
        }
    },

当我检查mongod.log时,我看到以下内容:

2017-05-04T19:08:14.203Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -8858253000066304220 } -> { _id: -8857450400323294366 } for collection data.logs from mongo03:27017 at epoch 56f5410efed7ec477fb62e31
2017-05-04T19:08:14.350Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }, with opId: 2291391315
2017-05-04T19:08:14.350Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }
2017-05-04T19:18:26.625Z I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2017-05-04T19:18:26.625Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }
2017-05-04T19:18:36.499Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }
2017-05-04T19:18:36.788Z I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "mongo01-2017-05-04T21:18:36.788+0200-590b7e8c1bc38fe0dd61db45", server: "mongo01", clientAddr: "", time: new Date(1493925516788), what: "moveChunk.to", ns: "data.logs", details: { min: { _id: -8858253000066304220 }, max: { _id: -8857450400323294366 }, step 1 of 5: 146, step 2 of 5: 279, step 3 of 5: 611994, step 4 of 5: 0, step 5 of 5: 10162, note: "success" } }
2017-05-04T19:19:04.059Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -9090190725188397877 } -> { _id: -9088854275798899737 } for collection data.logs from mongo04:27017 at epoch 56f5410efed7ec477fb62e31
2017-05-04T19:19:04.063Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }, with opId: 2291472928
2017-05-04T19:19:04.064Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }
2017-05-04T19:28:16.709Z I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2017-05-04T19:28:16.709Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }
2017-05-04T19:28:17.778Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }
2017-05-04T19:28:17.778Z I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "mongo01-2017-05-04T21:28:17.778+0200-590b80d11bc38fe0dd61db46", server: "mongo01", clientAddr: "", time: new Date(1493926097778), what: "moveChunk.to", ns: "data.logs", details: { min: { _id: -9090190725188397877 }, max: { _id: -9088854275798899737 }, step 1 of 5: 3, step 2 of 5: 4, step 3 of 5: 552641, step 4 of 5: 0, step 5 of 5: 1068, note: "success" } }
2017-05-04T19:28:34.889Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -8696921045434215002 } -> { _id: -8696381531400161154 } for collection data.logs from mongo06:27017 at epoch 56f5410efed7ec477fb62e31
2017-05-04T19:28:35.134Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -8696921045434215002 } -> { _id: -8696381531400161154 }, with opId: 2291544986
2017-05-04T19:28:35.134Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -8696921045434215002 } -> { _id: -8696381531400161154 }

因此迁移数据需要很长时间。这是我应该担心的吗?我应该采取任何行动还是离开并等待它?

为了清楚起见,我自己并没有开始任何迁移。它本身就发生了。这就是为什么我有点困惑。

请帮忙!

1 个答案:

答案 0 :(得分:0)

它解决了自己,只需要等待很长时间。其他服务器以" RangeDeleter"事后的行动现在似乎一切都很好。