MongoDB 回滚和副本集问题

时间:2021-04-27 01:53:04

标签: mongodb replicaset mongodb-replica-set

我们有一个三成员的 MongoDB 副本集。其中一个节点正在尝试回滚并失败。因此,Crashloop 中的这个 Mongo POD。什么是 MongoDB 中的“稳定时间戳”? Oplog的“顶”是什么?

initiate system update and upgrade stuck in mongodb
2021-04-08T06:51:06.690+0000 I STORAGE [initandlisten] WiredTiger record store oplog processing took 554ms
2021-04-08T06:51:06.693+0000 I STORAGE [initandlisten] Timestamp monitor starting
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten]
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine.
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems:
2021-04-08T06:51:06.697+0000 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options]
2021-04-08T06:51:06.698+0000 I CONTROL [initandlisten]
2021-04-08T06:51:06.718+0000 I SHARDING [initandlisten] Marking collection local.system.replset as collection version: <unsharded>
2021-04-08T06:51:06.720+0000 I STORAGE [initandlisten] Flow Control is enabled on this deployment.
2021-04-08T06:51:06.720+0000 I SHARDING [initandlisten] Marking collection admin.system.roles as collection version: <unsharded>
2021-04-08T06:51:06.720+0000 I SHARDING [initandlisten] Marking collection admin.system.version as collection version: <unsharded>
2021-04-08T06:51:06.722+0000 I SHARDING [initandlisten] Marking collection local.startup_log as collection version: <unsharded>
2021-04-08T06:51:06.722+0000 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/var/data/mongodb/diagnostic.data'
2021-04-08T06:51:06.723+0000 I SHARDING [initandlisten] Marking collection local.replset.minvalid as collection version: <unsharded>
2021-04-08T06:51:06.723+0000 I SHARDING [initandlisten] Marking collection local.replset.election as collection version: <unsharded>
2021-04-08T06:51:06.725+0000 I REPL [initandlisten] Rollback ID is 2
2021-04-08T06:51:06.726+0000 I REPL [initandlisten] Recovering from stable timestamp: Timestamp(1617686958, 1) (top of oplog:
{ ts: Timestamp(1617686416, 1), t: 32 }
, appliedThrough:
{ ts: Timestamp(1617686958, 1), t: 36 }
, TruncateAfter: Timestamp(0, 0))
2021-04-08T06:51:06.726+0000 I REPL [initandlisten] Starting recovery oplog application at the stable timestamp: Timestamp(1617686958, 1)
2021-04-08T06:51:06.726+0000 F REPL [initandlisten] Applied op { : Timestamp(1617686958, 1) } not found. Top of oplog is { : Timestamp(1617686416, 1) }.
2021-04-08T06:51:06.726+0000 F - [initandlisten] Fatal Assertion 40313 at src/mongo/db/repl/replication_recovery.cpp 511
2021-04-08T06:51:06.726+0000 F - [initandlisten] \n\n***aborting after fassert() failure\n\n

更新: 确切地说,我试图理解,这里提到的稳定时间戳是什么。为什么 oplog 的顶部很重要?

从稳定时间戳中恢复:Timestamp(1617686958, 1)(oplog 顶部: { ts: 时间戳(1617686416, 1), t: 32 } ,应用通过: { ts: 时间戳(1617686958, 1), t: 36 } , TruncateAfter: Timestamp(0, 0))

为什么这里的断言会失败?

2021-04-08T06:51:06.726+0000 F REPL [initandlisten] Applied op { : Timestamp(1617686958, 1) } not found. Top of oplog is { : Timestamp(1617686416, 1) }.
2021-04-08T06:51:06.726+0000 F - [initandlisten] Fatal Assertion 40313 at src/mongo/db/repl/replication_recovery.cpp 511

1 个答案:

答案 0 :(得分:4)

系统尝试进行回滚,但所需的条目不在该 opLog 中。

最好让该节点进行完全初始化。只需删除所有数据目录文件并启动 mongod。该节点将自动从最近的完全同步节点复制所有数据。