Question

我在Linux内核中观察到以下代码模式，例如net/sched/act_api.c或许多其他地方：

rtnl_lock();
rtnetlink_rcv_msg(skb, ...);
  replay:
  ret = process_msg(skb);
    ...
    /* try to obtain symbol which is in module. */
    /* if fail, try to load the module, otherwise use the symbol */
    a = get_symbol();
    if (a == NULL) {
       rtnl_unlock();
       request_module();
       rtnl_lock();
       /* now verify that we can obtain symbols from requested module and return EAGAIN.*/
       a = get_symbol();
       module_put();
       return -EAGAIN;
    }
  ...
  if (ret == -EAGAIN)
     goto replay;
  ...
rtnl_unlock();

request_module成功后，我们感兴趣的符号在内核内存空间中可用，我们可以使用它。但是我不明白为什么要返回EAGAIN并重新阅读符号，为什么不能只在request_module()之后继续？

Answer 1

如果您查看Linux内核中的current implementation，就会在上面的代码（get_symbol()）中相当于tc_lookup_action_n()的第二次调用之后发表评论原因如下：

rtnl_unlock();
request_module("act_%s", act_name);
rtnl_lock();

a_o = tc_lookup_action_n(act_name);

/* We dropped the RTNL semaphore in order to
 * perform the module load.  So, even if we
 * succeeded in loading the module we have to
 * tell the caller to replay the request.  We
 * indicate this using -EAGAIN.
 */
if (a_o != NULL) {
    err = -EAGAIN;
    goto err_mod;
}

即使可以请求和加载模块，因为信号量被删除以便加载模块，这是一个可以睡眠的操作（并且不是＆＃34;标准方式＆＃34;此功能被执行，该函数返回EAGAIN以表示它。

编辑澄清：

如果我们在添加新操作时查看调用序列（这可能会导致加载所需的模块），我们会有以下序列：tc_ctl_action() - ＆gt; tcf_action_add() - ＆gt; tcf_action_init() - ＆gt; tcf_action_init_1()。现在，如果＆＃34;撤回＆＃34; EAGAIN中的tc_ctl_action()错误重新回到case RTM_NEWACTION:，我们看到EAGAIN转化值会重复调用tcf_action_add。

互斥锁解锁和request_module（）行为

1 个答案: