Question

假设发生以下一系列事件：

我们设置了一个侦听套接字
线程使用EPOLLIN | EPOLLEXCLUSIVE
线程B也阻止等待侦听套接字变为可读，同样使用EPOLLIN | EPOLLEXCLUSIVE
传入连接到达侦听套接字，使套接字可读，内核选择唤醒线程A.
但是，在线程实际唤醒并调用accept之前，第二个传入连接到达侦听套接字。

这里，套接字已经可读，因此第二个连接不会改变它。这是级别触发的epoll，因此根据正常规则，第二个连接可以被视为无操作，第二个线程不需要被唤醒。 ......当然，不唤醒第二个线程会有点打败EPOLLEXCLUSIVE的整个目的？但我对API设计师做正确事情的信任并不像以前那么强烈，而且我无法在文档中找到任何可以排除这一点的内容。

问题

a）以上场景是否可能，两个连接到达但只有线程被唤醒？或者是否保证侦听套接字上的每个不同的传入连接都会唤醒另一个线程？

b）是否有一般规则来预测EPOLLEXCLUSIVE和水平触发的epoll如何相互作用？

b）字节流fds的EPOLLIN | EPOLLEXCLUSIVE和EPOLLOUT | EPOLLEXCLUSIVE怎么样，比如连接的TCP套接字或管道？例如。如果在管道已经可读的情况下到达更多数据会发生什么？

Answer 1

已编辑（原始答案位于用于测试的代码之后）

为了确保事情清楚，我将检查EPOLLEXCLUSIVE与边缘触发事件（EPOLLET）以及水平触发事件的关系，以显示这些效果如何预期行为。

众所周知：

边缘触发：设置EPOLLET后，仅当事件更改fd的状态时才会触发事件 - 这意味着只触发了第一个事件并且在完全处理该事件之前不会触发新事件。

此设计明确意味着阻止epoll_wait由于正在处理的事件而返回（即，当EPOLLIN已经被提升时新数据到达时{{1}没有调用所有数据，或者没有读过所有数据。

边缘触发事件规则很简单所有相同类型（即read）事件合并，直到处理完所有可用数据。

对于侦听套接字，EPOLLIN事件将再次被触发，直到所有现有的EPOLLIN＆＃34;积压＆＃34;已使用listen接受套接字。

在字节流的情况下，新事件不会被触发，直到从流中读取了所有可用字节（缓冲区被清空）。
触发级别：另一方面，级别触发事件的行为将更接近遗留accept（或select）的运作方式，从而允许{{1}与旧代码一起使用。

事件合并规则更复杂：只有在没有人等待事件时才会合并相同类型的事件（没有人等待poll返回）， 或者如果在epoll返回之前发生了多个事件 ...否则任何事件都会导致epoll_wait返回。

对于侦听套接字，每次客户端连接时都会触发epoll_wait事件...除非没有人等待epoll_wait返回，在这种情况下，下一次调用EPOLLIN会立即返回，并且在此期间发生的所有epoll_wait事件都将合并为一个事件。

在字节流的情况下，每次有新数据进入时都会触发新事件......除非当然没有人等待epoll_wait返回，在这种情况下，下一次调用将会立即返回到达返回util EPOLLIN的所有数据（即使它到达不同的块/事件）。
独家返回：epoll_wait标志用于防止＆＃34;雷鸣听到＆＃34;行为，因此每个epoll_wait唤醒事件只会唤醒一个EPOLLEXCLUSIVE来电。

正如我之前所指出的，对于边缘触发状态，epoll_wait唤醒事件是fd状态中的更改。因此，所有fd事件都将被引发，直到读取所有数据（侦听套接字的待办事项已清空）。

另一方面，对于关卡触发的事件，每个fd都会调用一个唤醒事件。如果没有人在等待，这些事件将被合并。

按照您的问题中的示例：

对于级别触发事件：每次客户端连接时，单个线程将从EPOLLIN返回...但是，如果两个线程连接两个以上的客户端忙于接受前两个客户端，这些EPOLLIN事件将合并为一个事件，下一次调用epoll_wait将立即返回该合并事件。

在问题中给出的示例的上下文中，线程B应该被唤醒＆＃34;唤醒＆＃34;由于EPOLLIN返回。

在这种情况下，两个线程都会＃34;比赛＆＃34;朝着epoll_wait。

然而，这并没有打败epoll_wait指令或意图。

accept指令旨在防止听到＆＃34;雷鸣声＃34;现象。在这种情况下，两个线程竞相接受两个连接。每个线程都可以（可能）安全地调用EPOLLEXCLUSIVE，没有错误。如果使用三个线程，第三个线程将继续睡眠。

如果EPOLLEXCLUSIVE没有被使用，那么只要连接可用，就会唤醒所有accept个线程，这意味着只要第一个连接到达，两个线程都会有一直在竞争接受一个连接（导致其中一个连接可能出错）。
对于边缘触发事件：只有一个线程可以接收＆＃34;唤醒＆＃34;呼叫。该线程应该EPOLLEXCLUSIVE所有等待连接（清空epoll_wait＆＃34;积压＆＃34;）。在积压清空之前，不会再为该套接字引发accept个事件。

这同样适用于可读的套接字和管道。被唤醒的线程预计会处理所有可读数据。这可以防止等待线程同时尝试读取数据并遇到文件锁竞争条件。

我建议（这就是我做的）将侦听套接字设置为非阻塞模式并在循环中调用listen，直到EPOLLIN（或accept）错误被引发，表明积压是空的。没有办法避免事件被合并的风险。从套接字读取也是如此。

使用代码进行测试：

我写了一个简单的测试，有一些EAGAIN命令和阻塞套接字。只有在两个线程都开始等待EWOULDBLOCK之后才会启动客户端套接字。

客户端线程启动被延迟，因此客户端1和客户端2开始分开。

一旦服务器线程被唤醒，它将在调用sleep之前休眠一秒钟（允许第二个客户端执行此操作）。也许服务器应该多睡一会儿，但它似乎足够接近管理调度程序而不需要求助于条件变量。

以下是我的测试代码的结果（可能是一团糟，我不是测试设计的最佳人选）...

在支持epoll的Ubuntu 16.10上，测试结果显示侦听线程一个接一个地被唤醒，以响应客户端。在问题的示例中，线程B被唤醒。

accept

与Ubuntu 16.04（没有EPOLLEXCLUSIVE支持）进行比较，比第一次连接唤醒两个线程。由于我使用阻塞套接字，第二个线程挂起Test address: <null>:8000 Server thread 2 woke up with 1 events Server thread 2 will sleep for a second, to let things happen. client number 1 connected Server thread 1 woke up with 1 events Server thread 1 will sleep for a second, to let things happen. client number 2 connected Server thread 2 accepted a connection and saying hello. client 1: Hello World - from server thread 2. Server thread 1 accepted a connection and saying hello. client 2: Hello World - from server thread 1.，直到客户端＃2连接。

EPOLLEXCLUSIVE

再一次比较，级别触发accept的结果显示两个线程都为第一个连接唤醒。由于我使用阻塞套接字，第二个线程挂起main.c:178:2: warning: #warning EPOLLEXCLUSIVE undeclared, test is futile [-Wcpp] #warning EPOLLEXCLUSIVE undeclared, test is futile ^ Test address: <null>:8000 Server thread 1 woke up with 1 events Server thread 1 will sleep for a second, to let things happen. Server thread 2 woke up with 1 events Server thread 2 will sleep for a second, to let things happen. client number 1 connected Server thread 1 accepted a connection and saying hello. client 1: Hello World - from server thread 1. client number 2 connected Server thread 2 accepted a connection and saying hello. client 2: Hello World - from server thread 2.，直到客户端＃2连接。

kqueue

我的测试代码是（抱歉没有评论和杂乱的代码，我不是为了将来的维护而写的）：

accept

<强> P.S。

作为最终建议，我认为每个进程只有一个线程和一个epoll Test address: <null>:8000 client number 1 connected Server thread 2 woke up with 1 events Server thread 1 woke up with 1 events Server thread 2 will sleep for a second, to let things happen. Server thread 1 will sleep for a second, to let things happen. Server thread 2 accepted a connection and saying hello. client 1: Hello World - from server thread 2. client number 2 connected Server thread 1 accepted a connection and saying hello. client 2: Hello World - from server thread 1.。通过这种方式，雷鸣听到了＃34;是一个非问题，#ifndef _GNU_SOURCE #define _GNU_SOURCE #endif #define ADD_EPOLL_OPTION 0 // define as EPOLLET or 0 #include <arpa/inet.h> #include <errno.h> #include <fcntl.h> #include <limits.h> #include <netdb.h> #include <pthread.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <sys/resource.h> #include <sys/socket.h> #include <sys/time.h> #include <sys/types.h> #include <time.h> #include <unistd.h> #if !defined(__linux__) && !defined(__CYGWIN__) #include <sys/event.h> #define reactor_epoll 0 #else #define reactor_epoll 1 #include <sys/epoll.h> #include <sys/timerfd.h> #endif int sock_listen(const char *address, const char *port); void *listen_threard(void *arg); void *client_thread(void *arg); int server_fd; char const *address = NULL; char const *port = "8000"; int main(int argc, char const *argv[]) { if (argc == 2) { port = argv[1]; } else if (argc == 3) { port = argv[2]; address = argv[1]; } fprintf(stderr, "Test address: %s:%s\n", address ? address : "<null>", port); server_fd = sock_listen(address, port); /* code */ pthread_t threads[4]; for (size_t i = 0; i < 2; i++) { if (pthread_create(threads + i, NULL, listen_threard, (void *)i)) perror("couldn't initiate server thread"), exit(-1); } for (size_t i = 2; i < 4; i++) { sleep(1); if (pthread_create(threads + i, NULL, client_thread, (void *)i)) perror("couldn't initiate client thread"), exit(-1); } // join only server threads. for (size_t i = 0; i < 2; i++) { pthread_join(threads[i], NULL); } close(server_fd); sleep(1); return 0; } /** Sets a socket to non blocking state. */ inline int sock_set_non_block(int fd) // Thanks to Bjorn Reese { /* If they have O_NONBLOCK, use the Posix way to do it */ #if defined(O_NONBLOCK) /* Fixme: O_NONBLOCK is defined but broken on SunOS 4.1.x and AIX 3.2.5. */ int flags; if (-1 == (flags = fcntl(fd, F_GETFL, 0))) flags = 0; // printf("flags initial value was %d\n", flags); return fcntl(fd, F_SETFL, flags | O_NONBLOCK); #else /* Otherwise, use the old way of doing it */ static int flags = 1; return ioctl(fd, FIOBIO, &flags); #endif } /* open a listenning socket */ int sock_listen(const char *address, const char *port) { int srvfd; // setup the address struct addrinfo hints; struct addrinfo *servinfo; // will point to the results memset(&hints, 0, sizeof hints); // make sure the struct is empty hints.ai_family = AF_UNSPEC; // don't care IPv4 or IPv6 hints.ai_socktype = SOCK_STREAM; // TCP stream sockets hints.ai_flags = AI_PASSIVE; // fill in my IP for me if (getaddrinfo(address, port, &hints, &servinfo)) { perror("addr err"); return -1; } // get the file descriptor srvfd = socket(servinfo->ai_family, servinfo->ai_socktype, servinfo->ai_protocol); if (srvfd <= 0) { perror("socket err"); freeaddrinfo(servinfo); return -1; } // // keep the server socket blocking for the test. // // make sure the socket is non-blocking // if (sock_set_non_block(srvfd) < 0) { // perror("couldn't set socket as non blocking! "); // freeaddrinfo(servinfo); // close(srvfd); // return -1; // } // avoid the "address taken" { int optval = 1; setsockopt(srvfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval)); } // bind the address to the socket { int bound = 0; for (struct addrinfo *p = servinfo; p != NULL; p = p->ai_next) { if (!bind(srvfd, p->ai_addr, p->ai_addrlen)) bound = 1; } if (!bound) { // perror("bind err"); freeaddrinfo(servinfo); close(srvfd); return -1; } } freeaddrinfo(servinfo); // listen in if (listen(srvfd, SOMAXCONN) < 0) { perror("couldn't start listening"); close(srvfd); return -1; } return srvfd; } /* will start listenning, sleep for 5 seconds, then accept all the backlog and * finish */ void *listen_threard(void *arg) { int epoll_fd; ssize_t event_count; #if reactor_epoll #ifndef EPOLLEXCLUSIVE #warning EPOLLEXCLUSIVE undeclared, test is futile #define EPOLLEXCLUSIVE 0 #endif // create the epoll wait fd epoll_fd = epoll_create1(0); if (epoll_fd < 0) perror("couldn't create epoll fd"), exit(1); // add the server fd to the epoll watchlist { struct epoll_event chevent = {0}; chevent.data.ptr = (void *)((uintptr_t)server_fd); chevent.events = EPOLLOUT | EPOLLIN | EPOLLERR | EPOLLEXCLUSIVE | ADD_EPOLL_OPTION; epoll_ctl(epoll_fd, EPOLL_CTL_ADD, server_fd, &chevent); } // wait with epoll struct epoll_event events[10]; event_count = epoll_wait(epoll_fd, events, 10, 5000); #else // testing on BSD, use kqueue epoll_fd = kqueue(); if (epoll_fd < 0) perror("couldn't create kqueue fd"), exit(1); // add the server fd to the kqueue watchlist { struct kevent chevent[2]; EV_SET(chevent, server_fd, EVFILT_READ, EV_ADD | EV_ENABLE, 0, 0, (void *)((uintptr_t)server_fd)); EV_SET(chevent + 1, server_fd, EVFILT_WRITE, EV_ADD | EV_ENABLE, 0, 0, (void *)((uintptr_t)server_fd)); kevent(epoll_fd, chevent, 2, NULL, 0, NULL); } // wait with kqueue static struct timespec reactor_timeout = {.tv_sec = 5, .tv_nsec = 0}; struct kevent events[10]; event_count = kevent(epoll_fd, NULL, 0, events, 10, &reactor_timeout); #endif close(epoll_fd); if (event_count <= 0) { fprintf(stderr, "Server thread %lu wakeup no events / error\n", (size_t)arg + 1); perror("errno "); return NULL; } fprintf(stderr, "Server thread %lu woke up with %lu events\n", (size_t)arg + 1, event_count); fprintf(stderr, "Server thread %lu will sleep for a second, to let things happen.\n", (size_t)arg + 1); sleep(1); int connfd; struct sockaddr_storage client_addr; socklen_t client_addrlen = sizeof client_addr; /* accept up all connections. we're non-blocking, -1 == no more connections */ if ((connfd = accept(server_fd, (struct sockaddr *)&client_addr, &client_addrlen)) >= 0) { fprintf(stderr, "Server thread %lu accepted a connection and saying hello.\n", (size_t)arg + 1); if (write(connfd, arg ? "Hello World - from server thread 2." : "Hello World - from server thread 1.", 35) < 35) perror("server write failed"); close(connfd); } else { fprintf(stderr, "Server thread %lu failed to accept a connection", (size_t)arg + 1); perror(": "); } return NULL; } void *client_thread(void *arg) { int fd; // setup the address struct addrinfo hints; struct addrinfo *addrinfo; // will point to the results memset(&hints, 0, sizeof hints); // make sure the struct is empty hints.ai_family = AF_UNSPEC; // don't care IPv4 or IPv6 hints.ai_socktype = SOCK_STREAM; // TCP stream sockets hints.ai_flags = AI_PASSIVE; // fill in my IP for me if (getaddrinfo(address, port, &hints, &addrinfo)) { perror("client couldn't initiate address"); return NULL; } // get the file descriptor fd = socket(addrinfo->ai_family, addrinfo->ai_socktype, addrinfo->ai_protocol); if (fd <= 0) { perror("client couldn't create socket"); freeaddrinfo(addrinfo); return NULL; } // // // Leave the socket blocking for the test. // // make sure the socket is non-blocking // if (sock_set_non_block(fd) < 0) { // freeaddrinfo(addrinfo); // close(fd); // return -1; // } if (connect(fd, addrinfo->ai_addr, addrinfo->ai_addrlen) < 0 && errno != EINPROGRESS) { fprintf(stderr, "client number %lu FAILED\n", (size_t)arg - 1); perror("client connect failure"); close(fd); freeaddrinfo(addrinfo); return NULL; } freeaddrinfo(addrinfo); fprintf(stderr, "client number %lu connected\n", (size_t)arg - 1); char buffer[128]; if (read(fd, buffer, 35) < 35) { perror("client: read error"); close(fd); } else { buffer[35] = 0; fprintf(stderr, "client %lu: %s\n", (size_t)arg - 1, buffer); close(fd); } return NULL; }（它仍然是非常新的，并且没有被广泛支持）可以被忽视......唯一的＆＃34;雷鸣般的听到＆＃34;这仍然暴露于有限数量的共享套接字，其中竞争条件可能有利于负载平衡。

原始答案

我不确定我是否理解这种混淆，因此我会通过fd和EPOLLEXCLUSIVE来展示他们的预期行为。

众所周知：

设置EPOLLET（边缘触发）后，会在EPOLLEXCLUSIVE州更改而不是EPOLLET事件上触发事件。

此设计明确意味着阻止fd由于正在处理的事件而返回（即，当fd已经被提升时新数据到达时{{1}没有调用所有数据，或者没有读过所有数据。

对于侦听套接字，epoll_wait事件将再次被触发，直到所有现有的EPOLLIN＆＃34;积压＆＃34;已使用read接受套接字。
EPOLLIN标志用于防止＆＃34;雷鸣听到＆＃34;行为，因此每个listen唤醒事件只会唤醒一个accept来电。

正如我之前所指出的，对于边缘触发状态，EPOLLEXCLUSIVE唤醒事件是epoll_wait状态中的更改。因此，所有fd事件都将被提升，直到读取所有数据（侦听套接字的待办事项已清空）。

合并这些行为时，并按照问题中的示例，只有一个线程可以接收＆＃34;唤醒＆＃34;呼叫。该线程应该fd所有等待连接（清空fd＆＃34;积压＆＃34;）或者不再为该套接字引发EPOLLIN事件。

这同样适用于可读的套接字和管道。被唤醒的线程预计会处理所有可读数据。这可以防止等待线程同时尝试读取数据并遇到文件锁竞争条件。

我建议您考虑避免边缘触发事件，如果您只想为每个accept唤醒事件调用listen一次。无论使用EPOLLIN，都存在不清空现有＆＃34;积压＆＃34;的风险，因此不会引发新的唤醒事件。

或者，我建议（这就是我所做的）将侦听套接字设置为非阻塞模式并在循环中调用accept，直到epoll_wait（或{{1} }）引发错误，表明积压是空的。

编辑1：级别触发事件

正如纳撒尼尔在评论中指出的那样，我完全误解了这个问题......我想我已经习惯EPOLLEXCLUSIVE成为被误解的元素。

那么，正常的，水平触发的事件（NOT accept）会发生什么？

嗯......预期的行为是边缘触发事件的精确镜像（相反）。

对于侦听套接字，只要新连接可用，EAGAIN就会返回，无论是否在上一个事件之后调用EWOULDBLOCK。

事件只是＆＃34;合并＆＃34;如果没有人等待EPOLLET ...在这种情况下，EPOLLET的下一次通话将立即返回。

在问题中给出的示例的上下文中，线程B应该被唤醒＆＃34;唤醒＆＃34;由于epoll_wait返回。

在这种情况下，两个线程都会＃34;比赛＆＃34;朝着accept。

然而，这并没有打败epoll_wait指令或意图。

epoll_wait指令旨在防止听到＆＃34;雷鸣声＃34;现象。在这种情况下，两个线程竞相接受两个连接。每个线程都可以（可能）安全地调用epoll_wait，没有错误。如果使用三个线程，第三个线程将继续睡眠。

如果accept没有被使用，那么只要连接可用，就会唤醒所有EPOLLEXCLUSIVE个线程，这意味着只要第一个连接到达，两个线程都会有一直在竞争接受一个连接（导致其中一个连接可能出错）。

Answer 2

这只是一个部分答案，但是Jason Baron（EPOLLEXCLUSIVE补丁的作者）刚刚回复了我发送给他的电子邮件，以确认在水平触发模式下使用EPOLLEXCLUSIVE时他做了认为两个连接可能会到达，但只有一个线程会被唤醒（线程B保持睡眠状态）。因此，当使用EPOLLEXCLUSIVE时，无论是否设置EPOLLET，都必须使用与边缘触发epoll相同的防御性编程。

epoll的EPOLLEXCLUSIVE模式如何与水平触发相互作用？

2 个答案: