我正在构建一个简单的gen_server模块,它监视多个远程节点的活动
当远程节点注册时,该模块使用erlang监视节点:monitor_node(Node,true)。每个节点仅注册一次(使用日志确认)
并且在gen_server的handle_info / 2回调中,它捕获{nodedown,Node}消息并使用erlang:monitor_node(Node,false)对该节点进行恶魔化。我希望只收到一次此消息:远程节点关闭时。
当我测试模块时,我发现当远程节点出现故障时,会向gen_server发送数百条{nodedown,Node}消息(数量从几百到几千不等)。
为什么monitor_node发送了多条消息?我该如何防止这种行为?
编辑:这是(部分)源代码
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast({shutdown_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:dirty_delete_object(NodeStatus) of
{aborted, Reason} ->
error_logger:warning_msg("transaction shutdown_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
handle_cast(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
handle_info({nodedown, Node}, Timer) ->
monitor_node(Node, false),
error_logger:info_msg("~p: node ~p down", [?MODULE, Node]),
mnesia:transaction(fun mnesia:delete/3, [node_info, Node, write]),
{noreply, Timer};
handle_info(Message, Timer) ->
error_logger:warning_msg("~p: received unknown message ~p", [?MODULE, Message]),
{noreply, Timer}.
答案 0 :(得分:5)
您已完成monitor_node(NodeName, true)
**INSIDE**
mnesia事务。
我认为因为monitor_node会在内部涉及(I / O操作)消息通信。
把这条线放在转换中是不合适的。它可能会向所涉及的节点发送'registered'
消息的handred。因此,当节点关闭时,已收到'nodedown'
个消息的传递。
If a process has made two calls to monitor_node(Node, true) and Node terminates,
**two nodedown messages are delivered to the process.** If there is no connection
to Node, there will be an attempt to create one. If this fails, a nodedown
message is delivered.
请将该行移出transaction
或仅使用"CASE"
表达式,然后重试。
register_node(#node_info{node = NodeName} = NodeInfo) ->
case mnesia:read(node_info, NodeName) of
[] ->
monitor_node(NodeName, true),
error_logger:info_msg("node ~p registered", [NodeName]);
[_OldInfo] ->
error_logger:trace_msg("info of node ~p updated", [NodeName])
end,
mnesia:write(NodeInfo).
handle_cast({register_node, #node_info{} = NodeStatus}, Timer) ->
case mnesia:transaction(fun register_node/1, [NodeStatus]) of
{aborted, Reason} ->
error_logger:warning_msg("transaction register_node failed: ~p", [Reason]);
_ ->
ok
end,
{noreply, Timer};
explanation of side-effect in mnesia transaction
Mnesia在事务执行时动态设置和释放锁, 因此,使用事务执行代码是非常危险的 副作用。特别是,事务中的接收语句 会导致交易挂起而永不返回的情况, 反过来又会导致锁定无法释放。这种情况可以 其他交易也使整个系统处于停滞状态 在其他进程中执行,或在其他节点上执行,被迫等待 有缺陷的交易。