Service Fabric MultiNode X509群集 - 等待安装程序服务完成超时

时间:2017-05-31 10:08:34

标签: windows azure x509certificate azure-service-fabric azure-virtual-machine

为了创建Azure SF测试环境,我在开发测试实验室中创建了三个azure VM。这些都是用X509s保护的。

我使用了信息Here& Here

机器是:

  • Windows 2016数据中心
  • 在同一个虚拟网络上
  • 禁用所有防火墙(可以从另一台机器ping每台机器)
  • 全部使用相同的管理员帐户

我使用文档提供的certsetup.ps1文件创建了自签名证书。服务器和服务器的一个证书群集按建议合并。

如果我运行TestConfiguration.ps1,我会得到以下输出。

LocalAdminPrivilege        : True
IsJsonValid                : True
IsCabValid                 :
RequiredPortsOpen          : True
RemoteRegistryAvailable    : True
FirewallAvailable          : True
RpcCheckPassed             : True
NoConflictingInstallations : True
FabricInstallable          : True
DataDrivesAvailable        : True
Passed                     : True

显然,IsCabValid字段是空白的,但是"通过"字段仍然建议安装是可能的。我继续运行下一个powershell命令来开始安装。

  

。\ CreateServiceFabricCluster.ps1 -ClusterConfigFilePath   \ ClusterConfig.X509.MultiMachine.json

按照上面的命令,进程启动,控制台窗口填充以下文本,表明节点间通信正常。

Creating Service Fabric Cluster...
If it's taking too long, please check in Task Manager details and see if Fabric.exe for each node is running. If not, please look at: 1. traces in DeploymentTraces directory and 2. traces in FabricLogRoot configured in ClusterConfig.json.
Trace folder already exists. Traces will be written to existing trace folder: C:\StandaloneCluster\DeploymentTraces
Running Best Practices Analyzer...
Best Practices Analyzer completed successfully.
Creating Service Fabric Cluster...
Processing and validating cluster config.
Configuring nodes.
Default installation directory chosen based on system drive of machine '10.0.0.4'.
Copying installer to all machines.
Configuring machine '10.0.0.4'.
Configuring machine '10.0.0.5'.
Configuring machine '10.0.0.6'.
Machine 10.0.0.6 configured.
Machine 10.0.0.5 configured.
Machine 10.0.0.4 configured.
Running Fabric service installation.
Successfully started FabricInstallerSvc on machine 10.0.0.4
Successfully started FabricInstallerSvc on machine 10.0.0.6
Successfully started FabricInstallerSvc on machine 10.0.0.5

出现几分钟的长暂停,然后显示超时错误,但没有真正指示原因。我在节点上搜索了窗口日志,但是无法发现任何进一步的信息。 PS控制台中显示的错误如下:

 Timed out waiting for Installer Service to complete for machine 10.0.0.4. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
Timed out waiting for Installer Service to complete for machine 10.0.0.6. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
CreateCluster Error: System.AggregateException: One or more errors occurred. ---> System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeploye
r -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Ac
tion`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Action`1 body)
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.RunFabricServices(List`1 machines, FabricPackageType fabricPackageType)
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.<CreateClusterAsyncInternal>d__7.MoveNext()
---> (Inner Exception #0) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---

---> (Inner Exception #1) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.6. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---

---> (Inner Exception #2) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.4. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---

Trace folder already exists. Traces will be written to existing trace folder: C:\StandaloneCluster\DeploymentTraces
Cleaning up faulted installation.
Removing configuration from machine 10.0.0.5
Removing configuration from machine 10.0.0.4
Removing configuration from machine 10.0.0.6

是否有一位Azure SF爱好者可以对此事有所了解,或者对我出错的地方提出任何建议?

2 个答案:

答案 0 :(得分:0)

这是一种在FabricHost未能出现时看到的一般故障模式,这可能由于多种原因而发生。

由于您使用的是原始Azure VM而不是SF VMSS部署,因此还必须确保在每台计算机上打开在群集配置NodeType下设置的上游端口。要进行正确的测试,请尝试首先在这些VM上部署不安全的群集。

如果上述工作,要调查,使用-NoCleanupOnFailure标志运行部署并检查其中一台故障机器,事件记录在&#34;应用程序和服务日志&gt; Microsoft-Service Fabric&gt;管理员&#34;

错误/警告日志应指示读取证书是否存在问题,或者是否存在任何其他阻止问题。检查证书是否已在每台计算机上对网络服务进行ACL操作,因为这是doc中列出的列出的要求之一。

当cert指纹包含无效字符时,会发生其他常见故障之一。 Windows证书管理工具中存在一个错误,导致显示的指纹包含此类隐藏的无效字符,当直接复制到配置中时,会导致部署问题。请使用十六进制编辑器(例如HxD)进行验证,配置指纹只包含有效字符。

如果这没有为您提供足够的信息来解决问题,请运行Standalone package中包含的Tools \ Microsoft.Azure.ServiceFabric.WindowsServer.SupportPackage.zip中的日志收集器工具,并将收集的日志上传到您选择的存储空间,以便与我们的团队共享。您可以将链接邮寄至sfsa@microsoft.com,我们可以帮助您查看此内容。

答案 1 :(得分:0)

对于cluster / server / reverseProxy证书,1)他们的私钥加载权限需要ACL为“网络服务”,2)他们的CA证书需要添加到TrustedRoot。