如何使用Akka.FSharp API在Akka.NET集群中实现故障转移?

时间:2017-03-14 19:52:37

标签: f# akka.net akka.net-cluster akka.fsharp

如何使用Akka.FSharp API在Akka.NET集群中实现故障转移?

我有以下用作种子的群集节点:

open Akka
open Akka.FSharp
open Akka.Cluster
open System
open System.Configuration

let systemName = "script-cluster"
let nodeName = sprintf "cluster-node-%s" Environment.MachineName
let akkaConfig = Configuration.parse("""akka {  
                                          actor {
                                            provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
                                          }
                                          remote {
                                            log-remote-lifecycle-events = off
                                            helios.tcp {
                                                hostname = "127.0.0.1"
                                                port = 2551       
                                            }
                                          }
                                          cluster {
                                            roles = ["seed"]  # custom node roles
                                            seed-nodes = ["akka.tcp://script-cluster@127.0.0.1:2551"]
                                            # when node cannot be reached within 10 sec, mark is as down
                                            auto-down-unreachable-after = 10s
                                          }
                                        }""")
let actorSystem = akkaConfig |> System.create systemName

let clusterHostActor =
    spawn actorSystem nodeName (fun (inbox: Actor<ClusterEvent.IClusterDomainEvent>) -> 
        let cluster = Cluster.Get actorSystem
        cluster.Subscribe(inbox.Self, [| typeof<ClusterEvent.IClusterDomainEvent> |])
        inbox.Defer(fun () -> cluster.Unsubscribe(inbox.Self))
        let rec messageLoop () = 
            actor {
                let! message = inbox.Receive()                        
                // TODO: Handle messages
                match message with
                | :? ClusterEvent.MemberJoined as event -> printfn "Member %s Joined the Cluster at %O" event.Member.Address.Host DateTime.Now
                | :? ClusterEvent.MemberLeft as event -> printfn "Member %s Left the Cluster at %O" event.Member.Address.Host DateTime.Now
                | other -> printfn "Cluster Received event %O at %O" other DateTime.Now

                return! messageLoop()
            }
        messageLoop())

然后我有一个可能死的任意节点:

open Akka
open Akka.FSharp
open Akka.Cluster
open System
open System.Configuration

let systemName = "script-cluster"
let nodeName = sprintf "cluster-node-%s" Environment.MachineName
let akkaConfig = Configuration.parse("""akka {  
                                          actor {
                                            provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
                                          }
                                          remote {
                                            log-remote-lifecycle-events = off
                                            helios.tcp {
                                                hostname = "127.0.0.1"
                                                port = 0       
                                            }
                                          }
                                          cluster {
                                            roles = ["role-a"]  # custom node roles
                                            seed-nodes = ["akka.tcp://script-cluster@127.0.0.1:2551"]
                                            # when node cannot be reached within 10 sec, mark is as down
                                            auto-down-unreachable-after = 10s
                                          }
                                        }""")
let actorSystem = akkaConfig |> System.create systemName

let listenerRef =  
    spawn actorSystem "temp2"
    <| fun mailbox ->
        let cluster = Cluster.Get (mailbox.Context.System)
        cluster.Subscribe (mailbox.Self, [| typeof<ClusterEvent.IMemberEvent>|])
        mailbox.Defer <| fun () -> cluster.Unsubscribe (mailbox.Self)
        printfn "Created an actor on node [%A] with roles [%s]" cluster.SelfAddress (String.Join(",", cluster.SelfRoles))
        let rec seed () = 
            actor {
                let! (msg: obj) = mailbox.Receive ()
                match msg with
                | :? ClusterEvent.MemberRemoved as actor -> printfn "Actor removed %A" msg
                | :? ClusterEvent.IMemberEvent           -> printfn "Cluster event %A" msg
                | _ -> printfn "Received: %A" msg
                return! seed () }
        seed ()

在群集中实施故障转移的建议做法是什么?

具体来说,有一个代码示例说明当一个集群的某个节点不再可用时集群应该如何表现吗?

  • 我的群集节点应该启动替换还是有不同的行为?
  • 是否有自动处理此配置的配置,我可以设置而无需编写代码?
  • 我需要执行哪些代码以及在哪里?

1 个答案:

答案 0 :(得分:3)

首先,最好是依靠MemberUpMemberRemoved事件(两者都实现ClusterEvent.IMemberEvent接口,因此订阅它),因为它们标记阶段,当节点加入/离开过程时已经完成。连接和离开事件不一定能确保节点在信号传输的时间点完全可操作。

关于故障转移方案:

  • 可以通过Akka.Cluster.Sharding插件自动旋转替换项(阅读文章12以获取有关其工作原理的更多信息)。在Akka.FSharp中没有相应的内容,但您可以改为使用Akkling.Cluster.Sharding插件:请参阅example code
  • 另一种方法是在每个节点上预先创建替换actor。您可以使用clustered routersdistributed publish/subscribe将消息路由到这些消息。然而,当你有无状态场景时,情况更是如此,这样每个演员都可以随时接受另一个演员的工作。这是在生活在许多不同节点上的许多演员之间分配工作的更通用的解决方案。
  • 您也可以将观察者设置为处理演员。通过使用monitor函数,您可以命令您的演员观看另一个演员(无论它住在哪里)。如果节点发生故障,有关死亡演员的信息将以Terminated消息的形式发送给所有观察者。这样,您可以实现自己的逻辑,即在另一个节点上重新创建actor。这实际上是最通用的方式,因为它不使用任何额外的插件或配置,但行为需要由您自己描述。