Azure Service Fabric MultiMachine Windows X509群集 - 等待机器vm1的安装程序服务完成超时

时间:2018-02-14 16:22:51

标签: azure azure-active-directory x509certificate azure-service-fabric windows-server

需要一些建议,任何帮助都非常感激。

尝试将https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-creation-for-windows-server

创建为独立的服务结构群集

具体来说是Windows.X509.MultiMachine。

所以,我有一个Active Directory域,它是证书。一台机器作为域控制器,三台节点我想创建一个集群。

TestConfiguration.ps1 powershell脚本表明一切正常,但CreateServiceFabricCluster.ps1需要花费大量时间并引发错误“等待机器DEV1完成安装程序服务的超时”。等等 DiagnosticsStore为空。

DeploymentTrace:

2018/02/14-13:58:51.837,Info,5084,SystemFabricDeployer.SFDeployer,Running Best Practices Analyzer...
2018/02/14-13:58:51.844,Verbose,5084,SystemFabricDeployer.SFDeployer,Validating executing user is an Administrator.
2018/02/14-13:58:51.850,Verbose,5084,SystemFabricDeployer.SFDeployer,Converting JSON config to model.
2018/02/14-13:58:52.200,Error,5084,SystemFabricDeployer.SFDeployer,Config validation: Server Certificate Thumbprint contains invalid characters
2018/02/14-13:58:52.201,Verbose,5084,SystemFabricDeployer.SFDeployer,Validating CAB file is valid at C:\Users\Administrator\Documents\mssf\DeploymentRuntimePackages\MicrosoftAzureServiceFabric.6.1.456.9494.cab.
2018/02/14-13:58:52.591,Error,5084,SystemFabricDeployer.SFDeployer,Best Practices Analyzer determined environment has an issue. Please see additional BPA log output in DeploymentTraces folder.
2018/02/14-13:58:52.592,Error,5084,SystemFabricDeployer.SFDeployer,Cluster Setup cancelled due to validation error(s) found by Best Practices Analyzer. Inspect details in DeploymentTraces log folder local to executing location.

事件日志:

2/14/2018 6:52:33 AM - DEV1 - Error - Timed out waiting for Installer Service to complete for machine DEV1. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
2/14/2018 6:52:33 AM - DEV1 - Error - Timed out waiting for Installer Service to complete for machine DEV2. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
2/14/2018 6:52:33 AM - DEV1 - Error - Timed out waiting for Installer Service to complete for machine DEV3. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
2/14/2018 6:52:49 AM - DEV1 - Error - federation open failed with FABRIC_E_TIMEOUT
2/14/2018 6:52:49 AM - DEV1 - Error - Fabric Node open failed with error code = FABRIC_E_TIMEOUT
2/14/2018 6:52:52 AM - DEV1 - Error - Target information file exists. This would indicate that Fabric node open or Fabric uninstall didn't happen successfully. Rolling back..
2/14/2018 6:57:50 AM - DEV1 - Error - federation open failed with FABRIC_E_TIMEOUT
2/14/2018 6:57:50 AM - DEV1 - Error - Fabric Node open failed with error code = FABRIC_E_TIMEOUT
2/14/2018 7:02:51 AM - DEV1 - Error - federation open failed with FABRIC_E_TIMEOUT
2/14/2018 7:02:51 AM - DEV1 - Error - Fabric Node open failed with error code = FABRIC_E_TIMEOUT
2/14/2018 7:04:24 AM - DEV1 - Error - CreateCluster Error: System.AggregateException: One or more errors occurred. ---> System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine DEV3. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Action`1 localFinally)
   at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Action`1 body)
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.RunFabricServices(List`1 machines, FabricPackageType fabricPackageType)
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.<CreateClusterAsyncInternal>d__1.MoveNext()
---> (Inner Exception #0) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine DEV3. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---

---> (Inner Exception #1) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine DEV1. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---

---> (Inner Exception #2) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine DEV2. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
   at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
   at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
   at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
   at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---

我的节点可解析名称是DEV1,DEV2,DEV3,域“hp.dev”,它的netbios名称是HPDEV。如果它们有帮助的话,所有这些都在Windows Server 2016 Standard上。

这是一个ClusterConfig.json:

{
"name": "hP Secure Cluster",
"clusterConfigurationVersion": "1.0.0",
"apiVersion": "10-2017",
"nodes": [
    {
        "nodeName": "Node1",
        "iPAddress": "DEV1",
        "nodeTypeRef": "NodeTypeDefault",
        "faultDomain": "fd:/dc1/r1",
        "upgradeDomain": "UD1"
    },
    {
        "nodeName": "Node2",
        "iPAddress": "DEV2",
        "nodeTypeRef": "NodeTypeDefault",
        "faultDomain": "fd:/dc1/r2",
        "upgradeDomain": "UD2"
    },
    {
        "nodeName": "Node3",
        "iPAddress": "DEV3",
        "nodeTypeRef": "NodeTypeDefault",
        "faultDomain": "fd:/dc1/r3",
        "upgradeDomain": "UD3"
    }
],
"properties": {
    "diagnosticsStore": {
        "metadata": "Please replace the diagnostics file share with an actual file share accessible from all cluster machines.",
        "dataDeletionAgeInDays": "7",
        "storeType": "FileShare",
        "connectionstring": "C:\\ProgramData\\SF\\DiagnosticsStore"
    },
    "security": {
        "metadata": "The Credential type X509 indicates this is cluster is secured using X509 Certificates. The thumbprint format is - d5 ec 42 3b 79 cb e5 07 fd 83 59 3c 56 b9 d5 31 24 25 42 64.",
        "ClusterCredentialType": "Windows",
        "ServerCredentialType": "X509",
        "WindowsIdentities": {
            "ClusterIdentity": "HPDEV\\Administrator"
        },
        "CertificateInformation": {
            "ServerCertificateCommonNames": {
                "CommonNames": [
                    {
                        "CertificateCommonName": "HPCA",
                    }
                ],
                "X509StoreName": "My"
            }
        }
    },
    "nodeTypes": [
        {
            "name": "NodeTypeDefault",
            "clientConnectionEndpointPort": "19000",
            "clusterConnectionEndpointPort": "19001",
            "leaseDriverEndpointPort": "19002",
            "serviceConnectionEndpointPort": "19003",
            "httpGatewayEndpointPort": "19080",
            "reverseProxyEndpointPort": "30000",
            "applicationPorts": {
                "startPort": "20001",
                "endPort": "20031"
            },
            "ephemeralPorts": {
                "startPort": "20032",
                "endPort": "20287"
            },
            "isPrimary": true
        }
    ],
    "fabricSettings": [
        {
            "name": "Setup",
            "parameters": [
                {
                    "name": "FabricDataRoot",
                    "value": "C:\\ProgramData\\SF"
                },
                {
                    "name": "FabricLogRoot",
                    "value": "C:\\ProgramData\\SF\\Log"
                }
            ]
        }
    ]
}

有什么想法吗?提前谢谢你。

1 个答案:

答案 0 :(得分:0)

我已经解决了问题。这是工作示例。

集群配置:

ClusterConfig.json

{
  "name": "SampleCluster",
  "clusterConfigurationVersion": "1.0.0",
  "apiVersion": "10-2017",
  "nodes": [
    {
      "nodeName": "vm0",
      "iPAddress": "HPSFSEC0",
      "nodeTypeRef": "NodeType0",
      "faultDomain": "fd:/dc1/r0",
      "upgradeDomain": "UD0"
    },
    {
      "nodeName": "vm1",
      "iPAddress": "HPSFSEC1",
      "nodeTypeRef": "NodeType0",
      "faultDomain": "fd:/dc1/r1",
      "upgradeDomain": "UD1"
    },
    {
      "nodeName": "vm2",
      "iPAddress": "HPSFSEC2",
      "nodeTypeRef": "NodeType0",
      "faultDomain": "fd:/dc1/r2",
      "upgradeDomain": "UD2"
    }
  ],
  "properties": {
    "diagnosticsStore": {
      "metadata": "Please replace the diagnostics file share with an actual file share accessible from all cluster machines. For example, \\\\machine1\\DiagnosticsStore.",
      "dataDeletionAgeInDays": "3",
      "storeType": "FileShare",
      "connectionstring": "\\\\HPSFSEC0\\DiagnosticsStore"
    },
    "security": {
      "metadata": "The Credential type X509 indicates this is cluster is secured using X509 Certificates. The thumbprint format is - d5 ec 42 3b 79 cb e5 07 fd 83 59 3c 56 b9 d5 31 24 25 42 64.",
      "ClusterCredentialType": "X509",
      "ServerCredentialType": "X509",
      "CertificateInformation": {
        "ClusterCertificateCommonNames": {
          "CommonNames": [
            {
              "CertificateCommonName": "FS-ClusterCert"
            }
          ],
          "X509StoreName": "My"
        },
        "ServerCertificateCommonNames": {
          "CommonNames": [
            {
              "CertificateCommonName": "FS-ServerCert"
            }
          ],
          "X509StoreName": "My"
        },
        "ClientCertificateThumbprints": [
          {
            "CertificateThumbprint": "C862B5CA4033B49F044EFFC47A4C1AE5158D72CF",
            "IsAdmin": true
          }
        ]
      }
    },
    "nodeTypes": [
      {
        "name": "NodeType0",
        "clientConnectionEndpointPort": "19000",
        "clusterConnectionEndpointPort": "19001",
        "leaseDriverEndpointPort": "19002",
        "serviceConnectionEndpointPort": "19003",
        "httpGatewayEndpointPort": "19080",
        "reverseProxyEndpointPort": "19081",
        "applicationPorts": {
          "startPort": "20001",
          "endPort": "20031"
        },
        "isPrimary": true
      }
    ],
    "fabricSettings": [
      {
        "name": "Setup",
        "parameters": [
          {
            "name": "FabricDataRoot",
            "value": "C:\\ProgramData\\SF"
          },
          {
            "name": "FabricLogRoot",
            "value": "C:\\ProgramData\\SF\\Log"
          }
        ]
      }
    ]
  }
}

生成群集,服务器和客户端证书:

GenAndExport_CertToPFX.ps1

# Provide desired name for certificates
$client = "FS-ClientCert"
$cluster = "FS-ClusterCert"
$server = "FS-ServerCert"

#Arrange into one
$cert_names = ($client,$cluster,$server)

# Set pass for exporting certificates
# !!! in the future make client cert with different password!!!
$cert_pass = "MyPass2018"
$pswd = ConvertTo-SecureString -String $cert_pass -Force –AsPlainText

# Set certificate path and folder name
$cert_path = "$pwd\"

# Set action -install or -clean certificates
$action='-install'
#$action='-clean'

# Set file to export info about certificate CN and Thumbprint  
$cert_info_txt = 'Certificate_info.txt'
###############################################################################

Function Prep_CertInfo_txt {
    Write-output "Certificate Thumbprint info:" | Out-File $cert_info_txt 
    Write-output `n | Out-File $cert_info_txt -Append
            }

if ($action -eq '-install') {Prep_certInfo_txt} 
    else {remove-item $cert_info_txt
              #foreach ($cert_item in $cert_names) {remove-item $cert_item -Include *.pfx}                                    
        }

foreach ($cert_item in $cert_names) {
# Run PowerShell "CertSetup.ps1" with arguments to gen apropriate cert
powershell -file "CertSetup.ps1" $action -CertSubjectName CN=$cert_item

# Just skip else in case of cleaning cert
if ($action -eq  '-clean') {continue}

# Get Thumbprint for each certificate and write thumbprint into separate txt file
$Thumbprint_cert="$Thumbprint_$cert_item"
$Thumbprint_cert = (Get-ChildItem -Path Cert:\LocalMachine\My | Where-Object {$_.Subject -match $cert_item}).Thumbprint -join ';';
$cert_item, $Thumbprint_cert | Out-File $cert_info_txt -Append
Write-output `n | Out-File $cert_info_txt -Append

# Export certificate to apropriate files
Get-ChildItem -Path cert:\localMachine\my\$Thumbprint_cert | Export-PfxCertificate -FilePath $cert_path$cert_item.pfx -Password $pswd

                                    }
###############################################################################

导入并设置证书所需的权限:

ImportCertFromPFX_SetPerm.ps1

$client = "FS-ClientCert"
$cluster = "FS-ClusterCert"
$server = "FS-ServerCert"

#Arrange into one
$cert_names = ($client,$cluster,$server)

# Set the name for account
$Service_name = "Network Service"

# Set pass for exporting certificates,
# !!! in the future make client cert with different password!!!
$cert_pass = "MyPass2018"

# Set certificate path and folder name
$cert_path = "$pwd\"

###############################################################################

foreach ($cert_item in $cert_names) {

    # Import  certificate
    $PfxFilePath ="$cert_path$cert_item.pfx"

    # Install to LocalMachine Personal Certificate
    Import-PfxCertificate -Exportable -CertStoreLocation Cert:\LocalMachine\My -FilePath $PfxFilePath -Password (ConvertTo-SecureString -String $cert_pass -AsPlainText -Force)
    # Install to LocalMachine Root Certificate
    Import-PfxCertificate -Exportable -CertStoreLocation Cert:\LocalMachine\Root -FilePath $PfxFilePath -Password (ConvertTo-SecureString -String $cert_pass -AsPlainText -Force)
    # Install to CurrentUser My Certificate
    Import-PfxCertificate -Exportable -CertStoreLocation Cert:\CurrentUser\My -FilePath $PfxFilePath -Password (ConvertTo-SecureString -String $cert_pass -AsPlainText -Force)

    #Get Thumbprint for each certificate
    $Thumbprint_cert = (Get-ChildItem -Path Cert:\LocalMachine\My | Where-Object {$_.Subject -match $cert_item}).Thumbprint -join ';';

    # Set permission by using external PS script "SetCertPermissionForNodes.ps1"
    powershell -file "SetCertPermissionForNodes.ps1" $Thumbprint_cert $Service_name

    }

# Intermediate Certificate for future implementation
# Import-PfxCertificate -Exportable -CertStoreLocation Cert:\LocalMachine\CA -FilePath $PfxFilePath -Password (ConvertTo-SecureString -String $cert_pass -AsPlainText -Force)
###############################################################################

希望对某人有帮助。