我们有一个带有3个分片的mongoDb集群,每个分片是一个包含3个节点的副本集,我们使用的mongoDb版本是3.2.6。我们有一个大小约230G的大型数据库,其中包含大约5500个集合。我们发现大约2300个集合不平衡,其他3200个集合均匀分布到3个分片。
下面是sh.status的结果(整个结果太大了,我只发布了部分内容):
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"minCompatibleVersion" : 5,
"currentVersion" : 6,
"clusterId" : ObjectId("57557345fa5a196a00b7c77a")
}
shards:
{ "_id" : "shard1", "host" : "shard1/10.25.8.151:27018,10.25.8.159:27018" }
{ "_id" : "shard2", "host" : "shard2/10.25.2.6:27018,10.25.8.178:27018" }
{ "_id" : "shard3", "host" : "shard3/10.25.2.19:27018,10.47.102.176:27018" }
active mongoses:
"3.2.6" : 1
balancer:
Currently enabled: yes
Currently running: yes
Balancer lock taken at Sat Sep 03 2016 09:58:58 GMT+0800 (CST) by iZ23vbzyrjiZ:27017:1467949335:-2109714153:Balancer
Collections with active migrations:
bdtt.normal_20131017 started at Sun Sep 18 2016 17:03:11 GMT+0800 (CST)
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
1490 : Failed with error 'aborted', from shard2 to shard3
1490 : Failed with error 'aborted', from shard2 to shard1
14 : Failed with error 'data transfer error', from shard2 to shard1
databases:
{ "_id" : "bdtt", "primary" : "shard2", "partitioned" : true }
bdtt.normal_20160908
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard2 142
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160909
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard1 36
shard2 42
shard3 46
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160910
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard1 34
shard2 32
shard3 32
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160911
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard1 30
shard2 32
shard3 32
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160912
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard2 126
too many chunks to print, use verbose if you want to force print
bdtt.normal_20160913
shard key: { "_id" : "hashed" }
unique: false
balancing: true
chunks:
shard2 118
too many chunks to print, use verbose if you want to force print
}
Collection" normal_20160913"不平衡,我发布下面这个集合的getShardDistribution()结果:
mongos> db.normal_20160913.getShardDistribution()
Shard shard2 at shard2/10.25.2.6:27018,10.25.8.178:27018
data : 4.77GiB docs : 203776 chunks : 118
estimated data per chunk : 41.43MiB
estimated docs per chunk : 1726
Totals
data : 4.77GiB docs : 203776 chunks : 118
Shard shard2 contains 100% data, 100% docs in cluster, avg obj size on shard : 24KiB
平衡器进程处于运行状态,并且chunksize是默认值(64M):
mongos> sh.isBalancerRunning()
true
mongos> use config
switched to db config
mongos> db.settings.find()
{ "_id" : "chunksize", "value" : NumberLong(64) }
{ "_id" : "balancer", "stopped" : false }
我从mogos日志中发现了很多moveChunk错误,这可能是为什么有些收藏不平衡的原因,这里是最新版本的部分:
2016-09-19T14:25:25.427+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:25:59.620+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:25:59.644+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:35:02.701+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:35:02.728+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:18.232+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:18.256+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:27.101+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:42:27.112+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
2016-09-19T14:43:41.889+0800 I SHARDING [conn37136926] moveChunk result: { ok: 0.0, errmsg: "Not starting chunk migration because another migration is already in progress", code: 117 }
我尝试手动使用moveChunk命令,它会返回相同的错误:
mongos> sh.moveChunk("bdtt.normal_20160913", {_id:ObjectId("57d6d107edac9244b6048e65")}, "shard3")
{
"cause" : {
"ok" : 0,
"errmsg" : "Not starting chunk migration because another migration is already in progress",
"code" : 117
},
"code" : 117,
"ok" : 0,
"errmsg" : "move failed"
}
我不确定是否创建了太多会导致迁移不堪重负的集合?每天将创建约60-80个新馆藏。
我在这里需要帮助来回答以下问题,任何提示都会很棒:
答案 0 :(得分:3)
回答我自己的问题: 最后我们找到了根本原因,这是一个与异常副本集配置引起的“MongoDB balancer timeout with delayed replica”完全相同的问题。 发生此问题时,我们的副本集配置如下:
shard1:PRIMARY> rs.conf()
{
"_id" : "shard1",
"version" : 3,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "10.25.8.151:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "10.25.8.159:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "10.25.2.6:37018",
"arbiterOnly" : true,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 3,
"host" : "10.47.114.174:27018",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : true,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(86400),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 2000,
"heartbeatTimeoutSecs" : 10,
"electionTimeoutMillis" : 10000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("5755464f789c6cd79746ad62")
}
}
副本集内有4个节点:一个主节点,一个从节点,一个仲裁节点和一个24小时延迟的从节点。这使得3个节点占多数,因为仲裁器没有数据存在,平衡器需要等待延迟的从机满足写入关注(确保接收器分片已经接收到块)。
有几种方法可以解决这个问题。我们刚刚删除仲裁器,平衡器现在工作正常。
答案 1 :(得分:0)
我要推测,但我的猜测是你的收藏非常不平衡,目前正在通过大块迁移来平衡(可能需要很长时间)。因此,您的手动块迁移已排队但不会立即执行。
以下几点可能会澄清一点:
您拥有的一个选项是在收藏中执行一些pre-splitting。预拆分过程基本上配置了一个空集合以开始平衡并避免首先失衡。因为一旦它们变得不平衡,块迁移过程可能不是你的朋友。
此外,您可能想要重新访问分片键。你可能在你的分片键上做错了什么导致了很多不平衡。
另外,您的数据大小对我来说似乎并不太大,无法保证分片配置。请记住,除非您受到数据大小/工作集大小属性的强制,否则永远不要进行分片配置。因为分片不是免费的(你可能已经感觉到了痛苦)。