writev()真的是原子的吗?

时间:2019-04-05 02:09:16

标签: c++ c linux-kernel system-calls

man writev的意思是:

  

readv()和writev()执行的数据传输是原子的:writev()写入的数据被写为一个单独的块,该块不与其他进程中写入的输出混合在一起(但请参见pipe(7))例外);类似地,保证readv()

这来自man 7 pipe

   O_NONBLOCK disabled, n <= PIPE_BUF
          All n bytes are written atomically; write(2) may block if there is not room for n bytes to be written immediately

   O_NONBLOCK enabled, n <= PIPE_BUF
          If there is room to write n bytes to the pipe, then write(2) succeeds immediately, writing all n bytes; otherwise write(2) fails, with errno set to EAGAIN.

   O_NONBLOCK disabled, n > PIPE_BUF
          The write is nonatomic: the data given to write(2) may be interleaved with write(2)s by other process; the write(2) blocks until n bytes have been written.

   O_NONBLOCK enabled, n > PIPE_BUF
          If  the pipe is full, then write(2) fails, with errno set to EAGAIN.  Otherwise, from 1 to n bytes may be written (i.e., a "partial write" may occur; the caller should check the return value from write(2) to see how many bytes were actually written), and these bytes may be interleaved with writes by other processes.
$ cat writev.c
#include <string.h>
#include <sys/uio.h>

int
main(int argc,char **argv) {
    static char part1[] = "ST";
    static char part2[] = "\n";
    struct iovec iov[2];

    iov[0].iov_base = part1;
    iov[0].iov_len = strlen(part1);

    iov[1].iov_base = part2;
    iov[1].iov_len = strlen(part2);

    writev(1,iov,2);

    return 0;
}
$ gcc writev.c
$ unbuffer bash -c 'for ((i=0; i<50; i++)); do ./a.out & ./a.out; done' | wc -c
300  # < PIPE_BUF

# Run the following several times to get the output corrupted
$ unbuffer bash -c 'for ((i=0; i<50; i++)); do ./a.out & ./a.out; done' | sort | uniq -c
      4 
     92 ST
      4 STST

如果writev是原子的(根据文档),谁能解释为什么不同写入的输出是交错的?

更新

strace -fo /tmp/log unbuffer bash -c 'for ((i=0; i<10000; i++)); do ./a.out & ./a.out; done' | sort | uniq -c的一些相关数据

13301 writev(1, [{iov_base="ST", iov_len=2}, {iov_base="\n", iov_len=1}], 2 <unfinished ...>
13302 mprotect(0x56397d7d8000, 4096, PROT_READ) = 0
13302 mprotect(0x7f7190c68000, 4096, PROT_READ) = 0
13302 munmap(0x7f7190c51000, 90695)     = 0
13302 writev(1, [{iov_base="ST", iov_len=2}, {iov_base="\n", iov_len=1}], 2) = 3
13301 <... writev resumed> )            = 3
24814 <... select resumed> )            = 1 (in [4])
13302 exit_group(0 <unfinished ...>
13301 exit_group(0 <unfinished ...>
13302 <... exit_group resumed>)         = ?
13301 <... exit_group resumed>)         = ?
24814 futex(0x55b5b8c11cc4, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...>
24807 <... futex resumed> )             = 0
24814 <... futex resumed> )             = 1
24807 futex(0x7f7f55e8f920, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
13302 +++ exited with 0 +++
24807 <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
13301 +++ exited with 0 +++
24807 futex(0x7f7f55e8f920, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
24814 futex(0x7f7f55e8f920, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
24807 <... futex resumed> )             = 0
24814 <... futex resumed> )             = 0
24807 read(4,  <unfinished ...>
24814 select(6, [5], [], [], NULL <unfinished ...>
24807 <... read resumed> "STST\n\n", 4096) = 6
24808 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13302
24807 write(1, "STST\n\n", 6 <unfinished ...>

1 个答案:

答案 0 :(得分:0)

如指定,当总iov的长度不超过PIPE_BUF时,对于管道,是,因为:

  

writev()函数应与write()等效,除非如下所述

对于管道没有例外(管道字样甚至没有出现在the writev specification中)。

在Linux中实际上可能不是。与单个writev等效的write仅适用于实现“新”(大约15年前)基于iov的基于读/写后端的内核文件类型。一些类似终端的设备仅实现使用单个缓冲区的旧接口,Linux将writev(或readv)模拟为多个write调用(或分别为read)电话)。如您所见in this commit to musl libcreadv情况也是有问题的。

我不确定管道是否受此问题影响。您必须深入研究内核源代码。