12小时前我从LibreNMS监控工具收到一条通知,告知我的12台MongoDB(版本3.2.11)服务器之一的mongo守护程序出现问题(连接时间超过10秒)。我决定忽略它并等待它,我只是假设它有点忙。
几个小时后,当我跑db.currentOp()
时,我有点担心。我看到有一个正在运行migrateThread
的操作,其中包含消息"第2步,共5步"以及带有消息的几个插入"查询不记录(太大)"。
在做了一些互联网搜索后,我发现它可能需要一些,因为它正在将数据块迁移到其他服务器。所以我决定等一下,因为我不想打断它,最终导致生产实例上的2 TB数据被破坏。
现在已经过了12个小时,我开始担心会发生什么。它仍处于5"的第2步,处理器负载非常高,但它似乎仍在移动块并产生新的migrateThread操作以及许多"查询不记录(也是大)"插入
这是我currentOp()
日志的一部分:
{
"desc" : "migrateThread",
"threadId" : "139962853246720",
"active" : true,
"opid" : -2003494368,
"secs_running" : 408,
"microsecs_running" : NumberLong(408914923),
"op" : "none",
"ns" : "data.logs",
"query" : {
},
"msg" : "step 2 of 5",
"numYields" : 0,
"locks" : {
"Global" : "w",
"Database" : "w",
"Collection" : "w"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(37984),
"w" : NumberLong(37982)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(1),
"w" : NumberLong(37981),
"W" : NumberLong(1)
},
"acquireWaitCount" : {
"W" : NumberLong(1)
},
"timeAcquiringMicros" : {
"W" : NumberLong(1446)
}
},
"Collection" : {
"acquireCount" : {
"r" : NumberLong(1),
"w" : NumberLong(37980),
"W" : NumberLong(1)
},
"acquireWaitCount" : {
"W" : NumberLong(1)
},
"timeAcquiringMicros" : {
"W" : NumberLong(3224)
}
}
}
},
{
"desc" : "conn451221",
"threadId" : "139962959451904",
"connectionId" : 451221,
"client" : "10.0.0.111:57408",
"active" : true,
"opid" : -2003439364,
"secs_running" : 0,
"microsecs_running" : NumberLong(37333),
"op" : "insert",
"ns" : "data.logs",
"query" : {
"$msg" : "query not recording (too large)"
},
"numYields" : 0,
"locks" : {
"Global" : "w",
"Database" : "w",
"Collection" : "w"
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(1),
"w" : NumberLong(1)
}
},
"Database" : {
"acquireCount" : {
"w" : NumberLong(1)
}
},
"Collection" : {
"acquireCount" : {
"w" : NumberLong(1)
}
}
}
},
当我检查mongod.log
时,我看到以下内容:
2017-05-04T19:08:14.203Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -8858253000066304220 } -> { _id: -8857450400323294366 } for collection data.logs from mongo03:27017 at epoch 56f5410efed7ec477fb62e31
2017-05-04T19:08:14.350Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }, with opId: 2291391315
2017-05-04T19:08:14.350Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }
2017-05-04T19:18:26.625Z I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2017-05-04T19:18:26.625Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }
2017-05-04T19:18:36.499Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -8858253000066304220 } -> { _id: -8857450400323294366 }
2017-05-04T19:18:36.788Z I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "mongo01-2017-05-04T21:18:36.788+0200-590b7e8c1bc38fe0dd61db45", server: "mongo01", clientAddr: "", time: new Date(1493925516788), what: "moveChunk.to", ns: "data.logs", details: { min: { _id: -8858253000066304220 }, max: { _id: -8857450400323294366 }, step 1 of 5: 146, step 2 of 5: 279, step 3 of 5: 611994, step 4 of 5: 0, step 5 of 5: 10162, note: "success" } }
2017-05-04T19:19:04.059Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -9090190725188397877 } -> { _id: -9088854275798899737 } for collection data.logs from mongo04:27017 at epoch 56f5410efed7ec477fb62e31
2017-05-04T19:19:04.063Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }, with opId: 2291472928
2017-05-04T19:19:04.064Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }
2017-05-04T19:28:16.709Z I SHARDING [migrateThread] Waiting for replication to catch up before entering critical section
2017-05-04T19:28:16.709Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }
2017-05-04T19:28:17.778Z I SHARDING [migrateThread] migrate commit succeeded flushing to secondaries for 'data.logs' { _id: -9090190725188397877 } -> { _id: -9088854275798899737 }
2017-05-04T19:28:17.778Z I SHARDING [migrateThread] about to log metadata event into changelog: { _id: "mongo01-2017-05-04T21:28:17.778+0200-590b80d11bc38fe0dd61db46", server: "mongo01", clientAddr: "", time: new Date(1493926097778), what: "moveChunk.to", ns: "data.logs", details: { min: { _id: -9090190725188397877 }, max: { _id: -9088854275798899737 }, step 1 of 5: 3, step 2 of 5: 4, step 3 of 5: 552641, step 4 of 5: 0, step 5 of 5: 1068, note: "success" } }
2017-05-04T19:28:34.889Z I SHARDING [migrateThread] starting receiving-end of migration of chunk { _id: -8696921045434215002 } -> { _id: -8696381531400161154 } for collection data.logs from mongo06:27017 at epoch 56f5410efed7ec477fb62e31
2017-05-04T19:28:35.134Z I SHARDING [migrateThread] Deleter starting delete for: data.logs from { _id: -8696921045434215002 } -> { _id: -8696381531400161154 }, with opId: 2291544986
2017-05-04T19:28:35.134Z I SHARDING [migrateThread] rangeDeleter deleted 0 documents for data.logs from { _id: -8696921045434215002 } -> { _id: -8696381531400161154 }
因此迁移数据需要很长时间。这是我应该担心的吗?我应该采取任何行动还是离开并等待它?
为了清楚起见,我自己并没有开始任何迁移。它本身就发生了。这就是为什么我有点困惑。
请帮忙!
答案 0 :(得分:0)
它解决了自己,只需要等待很长时间。其他服务器以" RangeDeleter"事后的行动现在似乎一切都很好。