erlang集群中的多个nodedown消息

时间:2012-11-27 04:07:48

标签: erlang

我正在构建一个简单的gen_server模块,它监视多个远程节点的活动

当远程节点注册时,该模块使用erlang监视节点:monitor_node(Node,true)。每个节点仅注册一次(使用日志确认)

并且在gen_server的handle_info / 2回调中,它捕获{nodedown,Node}消息并使用erlang:monitor_node(Node,false)对该节点进行恶魔化。我希望只收到一次此消息:远程节点关闭时。

当我测试模块时,我发现当远程节点出现故障时,会向gen_server发送数百条{nodedown,Node}消息(数量从几百到几千不等)。

为什么monitor_node发送了多条消息?我该如何防止这种行为?

编辑:这是(部分)源代码

register_node(#node_info{node = NodeName} = NodeInfo) ->
    case mnesia:read(node_info, NodeName) of
        [] ->
            monitor_node(NodeName, true),
            error_logger:info_msg("node ~p registered", [NodeName]);
        [_OldInfo] ->
            error_logger:trace_msg("info of node ~p updated", [NodeName])
    end,
    mnesia:write(NodeInfo).

handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
    case mnesia:transaction(fun register_node/1, [NodeStatus]) of
        {aborted, Reason} ->
            error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
        _ ->
        ok
    end,
    {noreply, Timer};
handle_cast({shutdown_node, #node_info{} = NodeStatus}, Timer) ->
    case mnesia:dirty_delete_object(NodeStatus) of
        {aborted, Reason} ->
            error_logger:warning_msg("transaction shutdown_node failed: ~p", [Reason]);
        _ ->
        ok
    end,
    {noreply, Timer};
handle_cast(Message, Timer) ->
    error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
    {noreply, Timer}.

handle_info({nodedown, Node}, Timer) ->
    monitor_node(Node, false),
    error_logger:info_msg("~p: node ~p down", [?MODULE, Node]),
    mnesia:transaction(fun mnesia:delete/3, [node_info, Node, write]),
    {noreply, Timer};
handle_info(Message, Timer) ->
    error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
    {noreply, Timer}.

1 个答案:

答案 0 :(得分:5)

您已完成monitor_node(NodeName, true) **INSIDE** mnesia事务。

我认为因为monitor_node会在内部涉及(I / O操作)消息通信。 把这条线放在转换中是不合适的。它可能会向所涉及的节点发送'registered'消息的handred。因此,当节点关闭时,已收到'nodedown'个消息的传递。

    If a process has made two calls to monitor_node(Node, true) and Node terminates, 
**two nodedown messages are delivered to the process.** If there is no connection 
to Node, there will be an attempt to create one. If this fails, a nodedown 
message is delivered.

请将该行移出transaction或仅使用"CASE"表达式,然后重试。

register_node(#node_info{node = NodeName} = NodeInfo) ->
    case mnesia:read(node_info, NodeName) of
        [] ->
            monitor_node(NodeName, true),
            error_logger:info_msg("node ~p registered", [NodeName]);
        [_OldInfo] ->
            error_logger:trace_msg("info of node ~p updated", [NodeName])
    end,
    mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
    case mnesia:transaction(fun register_node/1, [NodeStatus]) of
        {aborted, Reason} ->
            error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
        _ ->
        ok
    end,
    {noreply, Timer};

explanation of side-effect in mnesia transaction

  

Mnesia在事务执行时动态设置和释放锁,   因此,使用事务执行代码是非常危险的   副作用。特别是,事务中的接收语句   会导致交易挂起而永不返回的情况,   反过来又会导致锁定无法释放。这种情况可以   其他交易也使整个系统处于停滞状态   在其他进程中执行,或在其他节点上执行,被迫等待   有缺陷的交易。