新部署的应用程序的凭据被MongoDB拒绝,其中"身份验证失败"。 MongoDB Ops Manager停留在" AdjustUsers"已经好几个小时了。
验证:
cf service-connector 8080 opsmanager.service.consul:8080
在http://localhost:8080打开浏览器并使用在门户网站上获得的mongodb服务密钥登录:
"ops_manager_url": "http://opsmanager.service.consul:8080",
"ops_manager_user": "xxx",
"ops_manager_password": "xxx",
我最近使用mongorestore恢复了大约20GB的数据:
cf service-connector 27020 xxx-0.service.consul:33602
mongorestore --gzip --port 27020 --username xxx --password xxx --db rs_xxx /backup/mongodump/ > mongorestore.log 2>&1 &
其他已经存在的应用程序没有问题。只有我使用Blue-Green部署部署的应用程序无法连接:
cf push $APP-new
cf map-route $APP-new $DOMAIN --hostname $APP
cf unmap-route $APP $DOMAIN --hostname $APP || true
cf unmap-route $APP-new $DOMAIN --hostname $APP-new
cf delete -f $APP
cf rename $APP-new $APP
在我的应用中,我没有指定WriteConcern,所以我认为它只是主要的(" w:1")
答案 0 :(得分:2)
每cf bind-service
或cf unbind-service
Servicebroker(Cloud Foundry组件)都会创建新的随机生成凭据(请参阅cf env $APP
)。通过我们的MongoEnterprise服务产品,Service Broker连接到Ops Manager API并将新用户部署到mongod replicaset。
当Ops Manager部署用户(createUser
)时,他使用WriteConcern" most",这意味着至少有一个辅助需要确认写入。
COMMAND [conn53768] command rs_$DBNAME.$cmd command: createUser { createUser: "$USERNAME", pwd: "xxx", digestPassword: false, roles: [ { role: "readWrite", db: "rs_$DBNAME" } ], writeConcern: { w: "majority" } } keyUpdates:0 writeConflicts:0 numYields:0 reslen:138 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { W: 2 } }, Collection: { acquireCount: { w: 2 } } } protocol:op_query 9811ms
由于问题发生前几个小时执行的mongorestore
,createUser
命令有可能由于缺少来自忙碌的辅助设备的确认而超时在前景和背景中建立索引。索引构建(默认mongorestore
选项)首先在主要上完成,完成后将在辅助中完成。这解释了createUser
期间辅助词的滞后增长。
索引构建可能需要很长时间,具体取决于索引和数据大小。我们听到很多客户抱怨索引构建需要太长时间。
以下是索引构建的日志。在Ops Manager中,客户可以看到流式mongodb.log
(opsmanager.service.consul),并且所有副本集成员都有一点延迟。
...
2017-02-15T10:40:06.199+0000 I INDEX [repl writer worker 10] build index done. scanned 1108917 total records. 672 secs
2017-02-15T10:50:08.553+0000 I INDEX [repl writer worker 4] build index done. scanned 1108917 total records. 602 secs
2017-02-15T11:01:13.888+0000 I INDEX [repl writer worker 7] build index done. scanned 1108917 total records. 665 secs
...
2017-02-15T15:01:37.405+0000 I INDEX [repl index builder 176] build index done. scanned 1109531 total records. 659 secs
2017-02-15T15:01:37.406+0000 I INDEX [repl index builder 170] build index done. scanned 1109531 total records. 659 secs
2017-02-15T15:16:20.139+0000 I INDEX [repl index builder 170] build index done. scanned 1109699 total records. 882 secs
这些是来自Automation Agent的错误(Ops Manager通过HTTP与受管VM上的Autoamtion Agent进行通话,然后Automation Agent会使用本机协议与mongod通话)
[2017/02/15 13:33:36.297] [.error] [cm/mongoctl/authctl.go:updateUser26Style:697] [101] <rs_$DB_NAME> [13:33:36.297] Error updating 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) in db = rs_$DB_NAME : <rs_$DB_NAME> [13:33:36.297] Timed out (timeout=-1ns) trying to runCommandWithTimeout(dbName=rs_$DB_NAME, cmd=[{"Name":"updateUser","Value":"$USER"},{"Name":"pwd","Value":"$PASSWORD"},{"Name":"digestPassword","Value":false},{"Name":"roles","Value":[{"db":"rs_$DB_NAME","role":"readWrite"}]}])
[2017/02/15 13:33:36.297] [.error] [cm/mongoctl/authctl.go:UpsertUser:516] [101] <rs_$DB_NAME> [13:33:36.297] Error upserting 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) : <rs_$DB_NAME> [13:33:36.297] Error updating 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) in db = rs_$DB_NAME : <rs_$DB_NAME> [13:33:36.297] Timed out (timeout=-1ns) trying to runCommandWithTimeout(dbName=rs_$DB_NAME, cmd=[{"Name":"updateUser","Value":"$USER"},{"Name":"pwd","Value":"$PASSWORD"},{"Name":"digestPassword","Value":false},{"Name":"roles","Value":[{"db":"rs_$DB_NAME","role":"readWrite"}]}])
[2017/02/15 13:33:36.297] [.error] [cm/action/adjustusers.go:adjustUsers:52] [101] <rs_$DB_NAME> [13:33:36.297] Error upserting user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2). Trying to proceed though. : <rs_$DB_NAME> [13:33:36.297] Error upserting 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) : <rs_$DB_NAME> [13:33:36.297] Error updating 2.6-style user = AuthUser($USER@rs_$DB_NAME,roles=AuthRoles(1:{"Rolename":"readWrite","Db":"rs_$DB_NAME"}),interned=true,identity=2) in db = rs_$DB_NAME : <rs_$DB_NAME> [13:33:36.297] Timed out (timeout=-1ns) trying to runCommandWithTimeout(dbName=rs_$DB_NAME, cmd=[{"Name":"updateUser","Value":"$USER"},{"Name":"pwd","Value":"$PASSWORD"},{"Name":"digestPassword","Value":false},{"Name":"roles","Value":[{"db":"rs_$DB_NAME","role":"readWrite"}]}])
[2017/02/15 13:33:36.381] [.error] [cm/executor/executor.go:ExecutePlan:184] <rs_$DB_NAME> [13:33:36.381] Postcondition failed for step AdjustUsers because
[The value of 'currentState.UsersRight' = false, but it should be true]. Outcome=3
总结:WriteConcern primary的所有内容都运行良好,但WriteConcern大部分时间超时,阻止索引构建。
答案 1 :(得分:1)
注意到数据库绑定不成功的一种方法是在取消映射并删除旧应用程序之前测试新部署的应用程序(应用程序级别的运行状况检查):
cf push $APP-new
# only old app active
# test new app
curl --fail --silent --output /dev/null https://$APP-new.$DOMAIN/status
cf map-route $APP-new $DOMAIN --hostname $APP
# both apps active
cf apps
cf routes
cf unmap-route $APP $DOMAIN --hostname $APP || true
# only new app active
cf unmap-route $APP-new $DOMAIN --hostname $APP-new
cf delete -f $APP
cf rename $APP-new $APP
cf apps
此示例假定应用名称与主机名相同。