Question

设定：
1个主要，3个辅助和1个仲裁（全部5个正在运行。）

当我在primary中执行stepDown（）时，在shell中获得以下错误 -

m101:PRIMARY> rs.stepDown() 
2017-06-07T15:01:21.357 E QUERY    [thread1] Error: error doing query: failed: network error while attempting to run command 'replSetStepDown' on host '127.0.0.1:27018'  :  
DB.prototype.runCommand@src/mongo/shell/db.js:132:1 
DB.prototype.adminCommand@src/mongo/shell/db.js:150:16  
rs.stepDown@src/mongo/shell/utils.js:1261:12
@(shell):1:1 
2017-06-07T15:01:21.360 I NETWORK  [thread1] trying reconnect to 127.0.0.1:27018 (127.0.0.1) failed
2017-06-07T15:01:21.361 I NETWORK  [thread1] reconnect 127.0.0.1:27018 (127.0.0.1) ok
m101:SECONDARY>

请注意，主要内容最终会更改为辅助。但为什么我收到此错误，＆＃34;失败：网络错误＆＃34;这里？

以下是已降级的主要日志文件中的部分 -

2017-06-07T15:01:22.170 I REPL     [replication-5] Restarting oplog query due to error: InterruptedDueToReplStateChange: operation was interrupted. Last fetched optime (with hash): { ts: Timestamp 1496827872000|1, t: 2 }[-3490433912114125886]. Restarts remaining: 3
2017-06-07T15:01:22.170 I REPL     [replication-5] Scheduled new oplog query Fetcher source: localhost:27018 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1496827872000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } query metadata: { $replData: 1, $oplogQueryData: 1, $ssm: { $secondaryOk: true } } active: 1 timeout: 65000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 159434 -- target:localhost:27018 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1496827872000|1 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000, term: 2 } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2017-06-07T15:01:22.173 W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: Sync source's last applied OpTime { ts: Timestamp 1496827872000|1, t: 2 } is not greater than our last fetched OpTime { ts: Timestamp 1496827872000|1, t: 2 }. Choosing new sync source.
2017-06-07T15:01:22.176 I REPL     [rsBackgroundSync] could not find member to sync from
2017-06-07T15:01:22.180 I REPL     [ReplicationExecutor] Member localhost:27018 is now in state SECONDARY
2017-06-07T15:01:27.335 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to localhost:27018: InvalidSyncSource: Sync source was cleared. Was localhost:27018
2017-06-07T15:01:29.879 I -        [conn2] end connection 127.0.0.1:59400 (5 connections now open)
2017-06-07T15:01:30.899 I NETWORK  [thread1] connection accepted from 127.0.0.1:64421 #28 (5 connections now open)
2017-06-07T15:01:30.899 I NETWORK  [conn28] received client metadata from 127.0.0.1:64421 conn28: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Windows", name: "Microsoft Windows 8", architecture: "x86_64", version: "6.2 (build 9200)" } }
2017-06-07T15:01:30.917 I NETWORK  [thread1] connection accepted from 127.0.0.1:64425 #29 (6 connections now open)
2017-06-07T15:01:30.920 I NETWORK  [conn29] received client metadata from 127.0.0.1:64425 conn29: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Windows", name: "Microsoft Windows 8", architecture: "x86_64", version: "6.2 (build 9200)" } }
2017-06-07T15:01:30.927 I NETWORK  [thread1] connection accepted from 127.0.0.1:64428 #30 (7 connections now open)
2017-06-07T15:01:30.928 I NETWORK  [conn30] received client metadata from 127.0.0.1:64428 conn30: { driver: { name: "NetworkInterfaceASIO-Replication", version: "3.4.4" }, os: { type: "Windows", name: "Microsoft Windows 8", architecture: "x86_64", version: "6.2 (build 9200)" } }
2017-06-07T15:01:31.561 I REPL     [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms
2017-06-07T15:01:31.562 I REPL     [ReplicationExecutor] conducting a dry run election to see if we could be elected
2017-06-07T15:01:31.563 I REPL     [ReplicationExecutor] VoteRequester(term 2 dry run) received a no vote from localhost:27018 with reason "candidate's term is lower than mine"; response message: { term: 3, voteGranted: false, reason: "candidate's term is lower than mine", ok: 1.0 }
2017-06-07T15:01:31.563 I REPL     [ReplicationExecutor] not running for primary, we have been superceded already
2017-06-07T15:01:32.183 I REPL     [ReplicationExecutor] Member localhost:27019 is now in state PRIMARY
2017-06-07T15:01:32.185 I REPL     [rsBackgroundSync] sync source candidate: localhost:27019
2017-06-07T15:01:32.186 I ASIO     [NetworkInterfaceASIO-RS-0] Connecting to localhost:27019
2017-06-07T15:01:33.198 I ASIO     [NetworkInterfaceASIO-RS-0] Successfully connected to localhost:27019, took 1012ms (1 connections now open to localhost:27019)

试着了解这里发生了什么。任何线索都会受到赞赏。

提前致谢！

Answer 1

这是预期的行为。当主要步骤关闭时，它会丢弃所有连接，包括mongo shell中的活动连接。然后mongo shell重新建立连接（根据＆＃34;尝试重新连接＆＃34;消息）。

rs.stepDown()文档在这一点上可能更清楚，有一个shell输出示例，但相关描述是：

成功降级后，rs.stepDown（）会强制当前连接到数据库的所有客户端断开连接。这有助于确保客户端保持副本集的准确视图。

由于断开连接包含用于运行命令的连接，因此如果命令成功完成，则无法检索命令的返回状态。即，如果错误，您只能检索命令的返回状态。在脚本中运行该命令时，脚本应考虑此行为。

尝试降级mongodb副本集时出错

1 个答案: