(Linux 4.4)
我正在尝试使用内核模块通过Generic Netlink向用户进程发送信息。用户进程似乎没有成功接收到该消息 - nlmsg_unicast函数返回-111。
以下是我所知道的:
我在用户进程中使用libmnl(正如你可能从我对mnl_socket_recvfrom的猜测中猜到的那样)。
uname -a
Linux yaron-VirtualBox 4.4.0-57-generic#78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU / Linux
基本上,这是我在内核中的发送代码:
struct sk_buff *msg;
struct sock *socket;
struct netlink_kernel_cfg nlCfg = {
.groups = 1,
.flags = 0,
.input = NULL,
.cb_mutex = NULL,
.bind = NULL,
.unbind = NULL,
.compare = NULL,
};
void *msg_head;
int retval;
struct net init_net;
/* Open a socket */
socket = netlink_kernel_create(&init_net, NETLINK_GENERIC, &nlCfg);
if (socket == NULL) goto CmdFail;
/* Allocate space */
msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
if (msg == NULL) goto CmdFail;
/* Generate message header
* arguments of genlmsg_put:
* struct sk_buff *,
* int portID, <-- this is sender portID
* int netlinkSeqNum,
* struct genl_family *,
* int flags,
* u8 command_idx */
msg_head = genlmsg_put(msg, 0, ++netlinkSeqNum, &genlFamily, 0, MYFAMILY_CMD_MYMSG);
if (msg_head == NULL) goto CmdFail;
/* Add a MYFAMILY_ATTR_MYCMD attribute (command to be sent) */
retval = nla_put_string(msg, MYFAMILY_ATTR_MYMSG, "Temporary message");
if (retval != 0) goto CmdFail;
/* Finalize the message */
genlmsg_end(msg, msg_head); /* void inline function - no return value */
/* Send the message */
retval = nlmsg_unicast(socket, msg, userNetlinkPortID);
printk("nlmsg_unicast returned %d\n", retval);
if (retval != 0) goto CmdFail;
netlink_kernel_release(socket);
return;
CmdFail:
printk(KERN_ALERT "*** Failed to send command !\n");
netlink_kernel_release(socket);
return;
基本上,这是我在用户进程中的接收代码:
char bufferHdr[getpagesize()];
struct nlmsghdr *nlHeader;
struct genlmsghdr *nlHeaderExtraHdr;
int numBytes, seq, ret_val;
// Set up the header.
// Function mnl_nlmsg_put_header will zero out a length of bufferHdr sufficient to hold a Netlink header,
// and initialize the nlmsg_len field in that space to the size of a header.
// It returns a pointer to bufferHdr.
if ( (nlHeader = mnl_nlmsg_put_header(bufferHdr)) != (struct nlmsghdr *) bufferHdr ) {
perror("mnl_nlmsg_put_header failed");
exit(EXIT_FAILURE);
}
nlHeader->nlmsg_type = genetlinkFamilyID;
// Function mnl_nlmsg_put_extra_header extends the header, to allow for these extra fields.
if ( (nlHeaderExtraHdr = (struct genlmsghdr *) mnl_nlmsg_put_extra_header(nlHeader, sizeof(struct genlmsghdr))) != (struct genlmsghdr *) (bufferHdr + sizeof(struct nlmsghdr)) ) {
perror("mnl_nlmsg_put_extra_header failed");
exit(EXIT_FAILURE);
}
// No command to set
// No attributes to set
// Wait for a message, and process it
while (1) {
numBytes = mnl_socket_recvfrom(nlSocket, bufferHdr, sizeof(bufferHdr));
if (numBytes == -1) {
perror("mnl_socket_recvfrom returned error");
break;
}
// Callback run queue handler - use it to call getMsgCallback
std::cout << "received a msg, handling it" << std::endl;
ret_val = mnl_cb_run(bufferHdr, numBytes, seq, portid, getMsgCallback, NULL);
if (ret_val == -1) {
//perror("mnl_cb_run failed");
break;
} else if (ret_val == 0)
break;
}
return ret_val;
<小时/> 的附录: 仔细研究了内核源代码(在elixir.free-electrons.com上),我猜测我的消息永远不会进入用户进程;调试的建议将不胜感激。
以下是我看到的内容:nlmsg_unicast
调用netlink_unicast
,后者又调用netlink_getsockbyportid
,如下所示:
static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid)
{
struct sock *sock;
struct netlink_sock *nlk;
sock = netlink_lookup(sock_net(ssk), ssk->sk_protocol, portid);
if (!sock)
return ERR_PTR(-ECONNREFUSED);
/* Don't bother queuing skb if kernel socket has no input function */
nlk = nlk_sk(sock);
if (sock->sk_state == NETLINK_CONNECTED &&
nlk->dst_portid != nlk_sk(ssk)->portid) {
sock_put(sock);
return ERR_PTR(-ECONNREFUSED);
}
return sock;
}
我猜这里有两个条件中的一个用于撑船并且返回-ECONNREFUSED被触发。
有关如何调试这些条件是否属实的任何建议?我不能直接从我的模块代码中调用netlink_lookup
或nlk_sk
- 我猜这些符号没有公开 - 也不是它们的子函数 - 很多符号都埋在af_netlink中。 h和af_netlink.c,我想这些符号在构建外部模块时是不可用的,至少是正常的方式。 (它看起来不像af_netlink.h作为发行版的一部分。)