[下方更新]
我们遇到了一个问题,即System.Fabric.FabricNotPrimaryException'从我们正在开发的新服务中经常抛出。
数据流是:
通过流跟踪特定的异常实例表明:
这最后一点使得' FabricNotPrimaryException'我很困惑,所以我想知道是否有更多的信息可以记录,以确定这是否是根本原因?
此外还有其他任何情况,显而易见的是,会导致抛出此异常吗?
这是异常示例中的堆栈:
System.Fabric.FabricNotPrimaryException:at System.Fabric.Store.TStore
5.ThrowIfNotWritable (Microsoft.ServiceFabric.Data.Impl, Version=6.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at System.Fabric.Store.TStore
5 + d__218.MoveNext (Microsoft.ServiceFabric.Data.Impl,Version = 6.0.0.0,Culture = neutral, PublicKeyToken = 31bf3856ad364e35)at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at Microsoft.ServiceFabric.Data.Collections.DistributedDictionary2+<GetOrAddAsync>d__109.MoveNext (Microsoft.ServiceFabric.Data.Impl, Version=6.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at System.Runtime.CompilerServices.TaskAwaiter
1.GetResult(mscorlib, Version = 4.0.0.0,Culture = neutral,PublicKeyToken = b77a5c561934e089)
在 Clients.CoreEngine.Generic.StateManager.CoreEngineStateManager + LT;&GT; c__DisplayClass32_0 + LT; b__2&GT; d.MoveNext (Clients.CoreEngine.Generic,Version = 1.0.0.0, Culture = neutral,PublicKeyToken = null)at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at Resiliency.Retry.RetryHelper + LT;&GT; c__DisplayClass2_01+<<ExecuteInTransaction>b__0>d.MoveNext (Resiliency, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at Resiliency.Retry.RetryHelper+<ExecuteInTransaction>d__2
1.MoveNext (弹性,版本= 1.0.0.0,文化=中性, PublicKeyToken = null)at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at System.Runtime.CompilerServices.TaskAwaiter1.GetResult (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089)
1.GetResult(mscorlib, Version = 4.0.0.0,Culture = neutral,PublicKeyToken = b77a5c561934e089)
at Clients.CoreEngine.Generic.StateManager.CoreEngineStateManager+<ApplyUpdate>d__32.MoveNext (Clients.CoreEngine.Generic, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null) at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089) at System.Runtime.CompilerServices.TaskAwaiter
在 Clients.CoreEngine.Generic.Handlers.UpdateSystemEventHandler + d__7.MoveNext (Clients.CoreEngine.Generic,Version = 1.0.0.0, Culture = neutral,PublicKeyToken = null)at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (mscorlib,版本= 4.0.0.0,文化=中立, PublicKeyToken = b77a5c561934e089)at System.Runtime.CompilerServices.TaskAwaiter.GetResult(mscorlib, Version = 4.0.0.0,Culture = neutral,PublicKeyToken = b77a5c561934e089)
在 Clients.CoreEngine.Generic.CoreEngineProcessor + d__25.MoveNext (Clients.CoreEngine.Generic,Version = 1.0.0.0, Culture = neutral,PublicKeyToken = null)
其他信息:24/04/2018
我们已经能够在附加了调试器的本地群集上重新创建它。看来:
这是作为事件捕获的其中一个异常的堆栈:
&#34;时间戳&#34;:&#34; 2018-04-24T18:03:02.4053087 + 01:00&#34;, &#34; ProviderName&#34;:&#34; Clients-CoreEngineSvc-SAMPLE_CLIENT&#34;, &#34; Id&#34;:8, &#34;消息&#34;:&#34;&#39; CoreEngineProcessor - OnProcessorMessage,&#39;例外:在处理器上:[主要]。 ReadStatus:[授予]。 WriteStatus:[授予]&#34;, &#34; ProcessId&#34;:20732, &#34;等级&#34;:&#34;错误&#34;, &#34;关键字&#34;:&#34; 0x0000F00000000080&#34;, &#34; EventName&#34;:&#34; ServiceException&#34;, &#34; ActivityID&#34;:null, &#34; RelatedActivityID&#34;:null, &#34;有效载荷&#34;:{ &#34; serviceName&#34;:&#34; fabric:/Clients.Generic.App/CoreEngineSvc", &#34; serviceTypeName&#34;:&#34; CoreEngineSvcType&#34;, &#34; partitionId&#34;:&#34; 6ee32f92-d94e-4cba-b4d1-7ce335625c9c&#34;, &#34; applicationName&#34;:&#34; fabric:/Clients.Generic.App", &#34; applicationTypeName&#34;:&#34; Clients.Generic.AppType&#34;, &#34; nodeName&#34;:&#34; _Node_0&#34;, &#34; operationClass&#34;:&#34; CoreEngineProcessor&#34;, &#34; operationMethod&#34;:&#34; OnProcessorMessage&#34;, &#34; exceptionMessage&#34;:&#34;&#34;, &#34; unWrappedException&#34;:&#34; Microsoft.ServiceFabric.Data.Impl ::::: at System.Fabric.Store.TStore
5.ThrowIfNotWritable(Int64 tracer) at System.Fabric.Store.TStore
5.d__224.MoveNext() ---从抛出异常的先前位置开始的堆栈跟踪结束--- 在System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 在System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(任务任务) 在Microsoft.ServiceFabric.Data.Collections.DistributedDictionary2.<AddOrUpdateAsync>d__98.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter
1.GetResult() at Clients.CoreEngine.Generic.StateManager.CoreEngineStateManager.d__40.MoveNext() ---从抛出异常的先前位置开始的堆栈跟踪结束--- 在System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 在System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(任务任务) 在System.Runtime.CompilerServices.TaskAwaiter.GetResult() 在Clients.CoreEngine.Generic.CoreEngineProcessor.d__28.MoveNext() &#34 ;, &#34; exceptionString&#34;:&#34; System.Fabric.FabricNotPrimaryException 在System.Fabric.Store.TStore5.ThrowIfNotWritable(Int64 tracer) at System.Fabric.Store.TStore
5.d__224.MoveNext() ---从抛出异常的先前位置开始的堆栈跟踪结束--- 在System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 在System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(任务任务) 在Microsoft.ServiceFabric.Data.Collections.DistributedDictionary2.<AddOrUpdateAsync>d__98.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter
1.GetResult() at Clients.CoreEngine.Generic.StateManager.CoreEngineStateManager.d__40.MoveNext() ---从抛出异常的先前位置开始的堆栈跟踪结束--- 在System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() 在System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(任务任务) 在System.Runtime.CompilerServices.TaskAwaiter.GetResult() 在Clients.CoreEngine.Generic.CoreEngineProcessor.d__28.MoveNext()&#34;, &#34; info&#34;:&#34;在处理器上:[主要]。 ReadStatus:[授予]。 WriteStatus:[授予]&#34;, &#34; correlationId&#34;:&#34; 00000000-0000-0000-0000-000000000000&#34;, &#34; fixtureId&#34;:8173405
答案 0 :(得分:0)
总体而言,您的案例听起来像是太频繁或不正确的资源管理平衡配置的组合,并且一旦异常被触发就没有移动到副本的正确位置。
即使您已经到达当前小学并且正在开展工作,您仍然可以获得FabricNotPrimaryException。读取和写入状态标志很有效(出于很多相同的原因FabricNotReadableException可以显示并且应该重试)。这些标志主要是为了帮助人们提前纾困,如果他们要做一些长期工作(即:交易查询计划生成,或重建一些索引),以便他们可以在他们开始之前检查和保释。它们是一个有用的优化,但除了检查它们之外,不要告诉你任何其他内容。
当你看到FabricNotPrimaryException时,你应该拯救并将调用返回给客户端,以便它知道它正在与错误的副本通话,并且可以重新解析并找到新的主节点并从中断处继续。< / p>
在大多数情况下,当你看到NotPrimary时,发生了两件事之一:
小学刚刚被降职。这说明了你看到的竞争条件 - 当客户打电话并说'#34;给我主要&#34;,或者当你检查写状态时,这个 是主要的,但是在做的时候状态被撤销的工作。通常这是因为主要移动到其他节点。导致移动的正常事物是节点上升或下降,其他服务被创建或删除,或应用程序升级。如果您不进行升级,几乎可以肯定的是,群集资源管理正在进行重新平衡和移动。如果你看到过多的动作,那么深入了解一下这个配置,看看是否应该使用不同的metrics,balancing thresholds或完全不同的{{3为了减少不必要的运动。默认情况下,SF仅使用默认指标并尝试保持完全平衡,因此如果环境完全是动态的,您肯定可以获得破坏性的平衡,从而中断您的工作。
该节点的基础Service Fabric lease 已过期。这个节点即将被淘汰。当联邦层的租约失败时,所有初选都会立即失去其状态,以便他们无法完成更多的工作(因为他们在技术上并不是在集群中)。这种情况不太常见,但可以说明当群集中没有移动时,Primaries会失去状态。