文件映射与文件系统同步

时间:2018-08-20 11:19:12

标签: c linux shared-memory

我有一个包含一些数据的文件,该数据也是内存映射的。这样我既有文件描述符又有指向映射页面的指针。通常,仅从映射中读取数据,但最终也会对其进行修改。

修改包括修改文件中的某些数据(更新标题),以及附加一些新数据(即在文件的当前末尾写入内容)。

可以从不同的线程访问此数据结构,为防止冲突,我同步了对它的访问(互斥锁和朋友)。

在修改期间,我同时使用了文件映射和文件描述符。标头通过修改映射的内存进行隐式更新,而新数据通过适当的API(在Windows上为WriteFile,在posix上为write)写入文件。值得注意的是,新数据和标题属于不同的页面。

由于修改会更改文件大小,因此每次进行此类修改后都会重新初始化内存映射。也就是说,它没有被映射,然后再次映射(具有新的大小)。

我意识到对映射内存的写入是wrt文件系统的“异步”,并且不能保证顺序,但是我认为没有问题,因为我明确地关闭了文件映射,这应该(IMHO)充当一种一个冲洗点。

现在,这在Windows上可以正常工作,但是在linux(确切地说是android)上,最终映射的数据暂时变得不一致(即重试时数据可以)。似乎它不能反映新添加的数据。

我是否必须调用一些同步API以确保数据正确刷新?如果是这样,我应该使用哪一个:syncmsyncsyncfs或其他?

谢谢。

编辑:

这是一个伪代码,用于说明我正在处理的场景。 (实际代码当然要复杂得多)

struct CompressedGrid
{
    mutex m_Lock;
    int m_FileHandle;    
    void* m_pMappedMemory;

    Hdr* get_Hdr() { return /* the mapped memory with some offset*/; }

    void SaveGridCell(int idx, const Cell& cCompressed)
    {
        AutoLock scope(m_Lock);

        // Write to mapped memory
        get_Hdr()->m_pCellOffset[Idx] = /* current end of file */;

        // Append the data
        lseek64(m_FileHandle, 0, FILE_END);
        write(m_FileHandle, cCompressed.pPtr, cCompressed.nSize);

        // re-map
        munmap(...);
        m_pMappedMemory = mmap(...); // specify the new file size of course
    }

    bool DecodeGridCell(int idx, Cell& cRaw)
    {
        AutoLock scope(m_Lock);

        uint64_t nOffs = get_Hdr()->m_pCellOffset[Idx] = /* ;
        if (!nOffs)
            return false; // unavail

        const uint8_t* p = m_pMappedMemory + nOffs;

        cRaw.DecodeFrom(p); // This is where the problem appears!

        return true;
    }

1 个答案:

答案 0 :(得分:2)

使用addr = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, offset)映射文件。

如果文件大小更改,请使用newaddr = mremap(addr, len, newlen, MREMAP_MAYMOVE)更新映射以反映该映射。要扩展文件,请在重新映射文​​件之前使用ftruncate(fd, newlen)

您可以使用mprotect(addr, len, protflags)更改映射中任何页面上的保护(读/写)(两者都必须在页面边界上对齐)。如果映射太大而无法一次容纳在内存中,您还可以通过madvise()告诉内核您将来的访问,但是内核似乎很擅长管理预读等。

当您更改映射时,请使用msync(partaddr, partlen, MS_SYNC | MS_INVALIDATE)msync(partaddr, partlen, MS_ASYNC | MS_INVALIDATE)以确保从partlen开始的partaddr个字符的变化对其他映射和文件读取器可见。如果您使用MS_SYNC,则仅在更新完成后才返回呼叫。 MS_ASYNC调用告诉内核进行更新,但是不会等到更新完成。如果文件没有其他内存映射,则MS_INVALIDATE不执行任何操作;但是如果有的话,它会告诉内核确保更改也能反映在其中。

在2.6.19以后的Linux内核中,MS_ASYNC不执行任何操作,因为内核始终正确地跟踪更改(不需要msync(),除非可能在munmap()之前)。我不知道Android内核是否有补丁可以改变这种行为。我怀疑不是。将它们保留在代码中仍然是一个好主意,以实现在POSIXy系统之间的可移植性。

  

映射的数据临时出现不一致

好吧,除非您确实使用msync(partaddr, partlen, MS_SYNC | MS_INVALIDATE),否则内核将在最佳状态下进行更新。

因此,如果在进行下一步操作之前需要进行一些更改以使文件阅读器可见,请在进行这些更新的过程中使用msync(areaptr, arealen, MS_SYNC | MS_INVALIDATE)

如果您不关心确切的时刻,请使用msync(areaptr, arealen, MS_ASYNC | MS_INVALIDATE)。在当前的Linux内核上这将是空话,但最好保留它们的可移植性(如果需要,可能会注释掉它们,以提高性能),并提醒开发人员有关(缺乏)同步期望的信息。


当我对OP进行评论时,我根本看不到Linux上的同步问题。 (这并不意味着它不会在Android上发生,因为Android内核是Linux内核的衍生物,并不完全相同。)

我确实相信,从2.6.19开始,在Linux内核上根本不需要msync()调用,只要映射使用标志MAP_SHARED | MAP_NORESERVE,并且不使用{{ 1}}标志。这种想法的原因是,在这种情况下,映射和文件访问都应使用完全相同的页面缓存页面。

这里有两个测试程序,可用于在Linux上进行探索。首先,进行单过程测试 test-single.c

O_DIRECT

例如使用

进行编译和运行
#define  _POSIX_C_SOURCE  200809L
#define  _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

static inline int read_from(const int fd, void *const to, const size_t len, const off_t offset)
{
    char       *p = (char *)to;
    char *const q = (char *)to + len;
    ssize_t     n;

    if (lseek(fd, offset, SEEK_SET) != offset)
        return errno = EIO;

    while (p < q) {
        n = read(fd, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != -1)
            return errno = EIO;
        else
        if (errno != EINTR)
            return errno;
    }

    return 0;
}

static inline int write_to(const int fd, const void *const from, const size_t len, const off_t offset)
{
    const char *const q = (const char *)from + len;
    const char       *p = (const char *)from;
    ssize_t           n;

    if (lseek(fd, offset, SEEK_SET) != offset)
        return errno = EIO;

    while (p < q) {
        n = write(fd, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != -1)
            return errno = EIO;
        else
        if (errno != EINTR)
            return errno;
    }

    return 0;
}

int main(int argc, char *argv[])
{
    unsigned long  tests, n, merrs = 0, werrs = 0;
    size_t         page;
    long          *map, data[2];
    int            fd;
    char           dummy;

    if (argc != 3) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s FILENAME COUNT\n", argv[0]);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program will test synchronization between a memory map\n");
        fprintf(stderr, "and reading/writing the underlying file, COUNT times.\n");
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    if (sscanf(argv[2], " %lu %c", &tests, &dummy) != 1 || tests < 1) {
        fprintf(stderr, "%s: Invalid number of tests to run.\n", argv[2]);
        return EXIT_FAILURE;
    }

    /* Create the file. */
    page = sysconf(_SC_PAGESIZE);
    fd = open(argv[1], O_RDWR | O_CREAT | O_EXCL, 0644);
    if (fd == -1) {
        fprintf(stderr, "%s: Cannot create file: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }
    if (ftruncate(fd, page) == -1) {
        fprintf(stderr, "%s: Cannot resize file: %s.\n", argv[1], strerror(errno));
        unlink(argv[1]);
        return EXIT_FAILURE;
    }

    /* Map it. */
    map = mmap(NULL, page, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_NORESERVE, fd, 0);
    if (map == MAP_FAILED) {
        fprintf(stderr, "%s: Cannot map file: %s.\n", argv[1], strerror(errno));
        unlink(argv[1]);
        close(fd);
        return EXIT_FAILURE;
    }

    /* Test loop. */
    for (n = 0; n < tests; n++) {

        /* Update map. */
        map[0] = (long)(n + 1);
        map[1] = (long)(~n);

        /* msync(map, 2 * sizeof map[0], MAP_SYNC | MAP_INVALIDATE); */

        /* Check the file contents. */
        if (read_from(fd, data, sizeof data, 0)) {
            fprintf(stderr, "read_from() failed: %s.\n", strerror(errno));
            munmap(map, page);
            unlink(argv[1]);
            close(fd);
            return EXIT_FAILURE;
        }
        werrs += (data[0] != (long)(n + 1) || data[1] != (long)(~n));

        /* Update data. */
        data[0] = (long)(n * 386131);
        data[1] = (long)(n * -257);
        if (write_to(fd, data, sizeof data, 0)) {
            fprintf(stderr, "write_to() failed: %s.\n", strerror(errno));
            munmap(map, page);
            unlink(argv[1]);
            close(fd);
            return EXIT_FAILURE;
        }
        merrs += (map[0] != (long)(n * 386131) || map[1] != (long)(n * -257));
    }

    munmap(map, page);
    unlink(argv[1]);
    close(fd);

    if (!werrs && !merrs)
        printf("No errors detected.\n");
    else {
        if (!werrs)
            printf("Detected %lu times (%.3f%%) when file contents were incorrect.\n",
                   werrs, 100.0 * (double)werrs / (double)tests);
        if (!merrs)
            printf("Detected %lu times (%.3f%%) when mapping was incorrect.\n",
                   merrs, 100.0 * (double)merrs / (double)tests);
    }

    return EXIT_SUCCESS;
}

在同一过程中完成两次访问时,测试一百万次,映射和文件内容是否保持同步。请注意,gcc -Wall -O2 test-single -o single ./single temp 1000000 调用已被注释掉,因为在我的机器上它是不需要的:即使没有它,我也从未在测试过程中看到任何错误/不同步。

我的机器上的测试速度约为每秒550,000次测试。请注意,每个测试都以两种方式进行,因此包括一次读取和一次写入。我只是无法获得此检测到任何错误。它也被写成对错误也很敏感。

第二个测试程序使用两个子进程和一个POSIX实时信号来告诉另一个进程检查内容。 test-multi.c

msync()

请注意,子进程将单独打开临时文件。要编译并运行,请使用例如

#define  _POSIX_C_SOURCE  200809L
#define  _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

#define  NOTIFY_SIGNAL  (SIGRTMIN+0)

int mapper_process(const int fd, const size_t len)
{
    long       value = 1, count[2] = { 0, 0 };
    long      *data;
    siginfo_t  info;
    sigset_t   sigs;
    int        signum;

    if (fd == -1) {
        fprintf(stderr, "mapper_process(): Invalid file descriptor.\n");
        return EXIT_FAILURE;
    }

    data = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_NORESERVE, fd, 0);
    if (data == MAP_FAILED) {
        fprintf(stderr, "mapper_process(): Cannot map file.\n");
        return EXIT_FAILURE;
    }

    sigemptyset(&sigs);
    sigaddset(&sigs, NOTIFY_SIGNAL);
    sigaddset(&sigs, SIGINT);
    sigaddset(&sigs, SIGHUP);
    sigaddset(&sigs, SIGTERM);

    while (1) {
        /* Wait for the notification. */
        signum = sigwaitinfo(&sigs, &info);
        if (signum == -1) {
            if (errno == EINTR)
                continue;
            fprintf(stderr, "mapper_process(): sigwaitinfo() failed: %s.\n", strerror(errno));
            munmap(data, len);
            return EXIT_FAILURE;
        }
        if (signum != NOTIFY_SIGNAL)
            break;

        /* A notify signal was received. Check the write counter. */
        count[ (data[0] == value) ]++;

        /* Update. */
        data[0] = value++;
        data[1] = -(value++);

        /* Synchronize */
        /* msync(data, 2 * sizeof (data[0]), MS_SYNC | MS_INVALIDATE); */

        /* And let the writer know. */
        kill(info.si_pid, NOTIFY_SIGNAL);
    }

    /* Print statistics. */
    printf("mapper_process(): %lu errors out of %lu cycles (%.3f%%)\n",
           count[0], count[0] + count[1], 100.0 * (double)count[0] / (double)(count[0] + count[1]));
    fflush(stdout);

    munmap(data, len);
    return EXIT_SUCCESS;
}

static inline int read_from(const int fd, void *const to, const size_t len, const off_t offset)
{
    char       *p = (char *)to;
    char *const q = (char *)to + len;
    ssize_t     n;

    if (lseek(fd, offset, SEEK_SET) != offset)
        return errno = EIO;

    while (p < q) {
        n = read(fd, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != -1)
            return errno = EIO;
        else
        if (errno != EINTR)
            return errno;
    }

    return 0;
}

static inline int write_to(const int fd, const void *const from, const size_t len, const off_t offset)
{
    const char *const q = (const char *)from + len;
    const char       *p = (const char *)from;
    ssize_t           n;

    if (lseek(fd, offset, SEEK_SET) != offset)
        return errno = EIO;

    while (p < q) {
        n = write(fd, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != -1)
            return errno = EIO;
        else
        if (errno != EINTR)
            return errno;
    }

    return 0;
}

int writer_process(const int fd, const size_t len, const pid_t other)
{
    long       data[2] = { 0, 0 }, count[2] = { 0, 0 };
    long       value = 0;
    siginfo_t  info;
    sigset_t   sigs;
    int        signum;

    sigemptyset(&sigs);
    sigaddset(&sigs, NOTIFY_SIGNAL);
    sigaddset(&sigs, SIGINT);
    sigaddset(&sigs, SIGHUP);
    sigaddset(&sigs, SIGTERM);

    while (1) {

        /* Update. */
        data[0] = ++value;
        data[1] = -(value++);

        /* then write the data. */
        if (write_to(fd, data, sizeof data, 0)) {
            fprintf(stderr, "writer_process(): write_to() failed: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        /* Let the mapper know. */
        kill(other, NOTIFY_SIGNAL);

        /* Wait for the notification. */        
        signum = sigwaitinfo(&sigs, &info);
        if (signum == -1) {
            if (errno == EINTR)
                continue;
            fprintf(stderr, "writer_process(): sigwaitinfo() failed: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }
        if (signum != NOTIFY_SIGNAL || info.si_pid != other)
            break;

        /* Reread the file. */
        if (read_from(fd, data, sizeof data, 0)) {
            fprintf(stderr, "writer_process(): read_from() failed: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        /* Check the read counter. */
        count[ (data[1] == -value) ]++;
    }

    /* Print statistics. */
    printf("writer_process(): %lu errors out of %lu cycles (%.3f%%)\n",
           count[0], count[0] + count[1], 100.0 * (double)count[0] / (double)(count[0] + count[1]));
    fflush(stdout);

    return EXIT_SUCCESS;
}

int main(int argc, char *argv[])
{
    struct timespec  duration;
    double           seconds;
    pid_t            mapper, writer, p;
    size_t           page;
    siginfo_t        info;
    sigset_t         sigs;
    int              fd, status;
    char             dummy;

    if (argc != 3) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s FILENAME SECONDS\n", argv[0]);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program will test synchronization between a memory map\n");
        fprintf(stderr, "and reading/writing the underlying file.\n");
        fprintf(stderr, "The test will run for the specified time, or indefinitely\n");
        fprintf(stderr, "if SECONDS is zero, but you can also interrupt it with\n");
        fprintf(stderr, "Ctrl+C (INT signal).\n");
        fprintf(stderr, "\n");
        return EXIT_FAILURE;
    }

    if (sscanf(argv[2], " %lf %c", &seconds, &dummy) != 1) {
        fprintf(stderr, "%s: Invalid number of seconds to run.\n", argv[2]);
        return EXIT_FAILURE;
    }
    if (seconds > 0) {
        duration.tv_sec = (time_t)seconds;
        duration.tv_nsec = (long)(1000000000 * (seconds - (double)(duration.tv_sec)));
    } else {
        duration.tv_sec = 0;
        duration.tv_nsec = 0;
    }

    /* Block INT, HUP, CHLD, and the notification signal. */
    sigemptyset(&sigs);
    sigaddset(&sigs, SIGINT);
    sigaddset(&sigs, SIGHUP);
    sigaddset(&sigs, SIGCHLD);
    sigaddset(&sigs, NOTIFY_SIGNAL);
    if (sigprocmask(SIG_BLOCK, &sigs, NULL) == -1) {
        fprintf(stderr, "Cannot block the necessary signals: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    /* Create the file. */
    page = sysconf(_SC_PAGESIZE);
    fd = open(argv[1], O_RDWR | O_CREAT | O_EXCL, 0644);
    if (fd == -1) {
        fprintf(stderr, "%s: Cannot create file: %s.\n", argv[1], strerror(errno));
        return EXIT_FAILURE;
    }
    if (ftruncate(fd, page) == -1) {
        fprintf(stderr, "%s: Cannot resize file: %s.\n", argv[1], strerror(errno));
        unlink(argv[1]);
        return EXIT_FAILURE;
    }
    close(fd);
    fd = -1;

    /* Ensure streams are flushed before forking. They should be, we're just paranoid here. */
    fflush(stdout);
    fflush(stderr);

    /* Fork the mapper child process. */
    mapper = fork();
    if (mapper == -1) {
        fprintf(stderr, "Cannot fork mapper child process: %s.\n", strerror(errno));
        unlink(argv[1]);
        return EXIT_FAILURE;
    }
    if (!mapper) {
        fd = open(argv[1], O_RDWR);
        if (fd == -1) {
            fprintf(stderr, "mapper_process(): %s: Cannot open file: %s.\n", argv[1], strerror(errno));
            return EXIT_FAILURE;
        }
        status = mapper_process(fd, page);
        close(fd);
        return status;
    }

    /* For the writer child process. (mapper contains the PID of the mapper process.) */
    writer = fork();
    if (writer == -1) {
        fprintf(stderr, "Cannot fork writer child process: %s.\n", strerror(errno));
        unlink(argv[1]);
        kill(mapper, SIGKILL);
        return EXIT_FAILURE;
    }
    if (!writer) {
        fd = open(argv[1], O_RDWR);
        if (fd == -1) {
            fprintf(stderr, "writer_process(): %s: Cannot open file: %s.\n", argv[1], strerror(errno));
            return EXIT_FAILURE;
        }
        status = writer_process(fd, page, mapper);
        close(fd);
        return status;
    }

    /* Wait for a signal. */
    if (duration.tv_sec || duration.tv_nsec)
        status = sigtimedwait(&sigs, &info, &duration);
    else
        status = sigwaitinfo(&sigs, &info);

    /* Whatever it was, we kill the child processes. */
    kill(mapper, SIGHUP);
    kill(writer, SIGHUP);
    do {
        p = waitpid(-1, NULL, 0);
    } while (p != -1 || errno == EINTR);

    /* Cleanup. */
    unlink(argv[1]);

    printf("Done.\n");                 
    return EXIT_SUCCESS;
}

第二个参数是测试的持续时间,以秒为单位。 (您可以使用SIGINT( Ctrl + C )或SIGHUP安全地中断测试。)

在我的计算机上,测试速度约为每秒12万次测试; gcc -Wall -O2 test-multi.c -o multi ./multi temp 10 调用也在此处被注释掉,因为即使没有它,我也看不到任何错误/不同步。 (此外,msync()msync(ptr, len, MS_SYNC)的运行速度非常慢;使用这两种方法,我每秒只能进行不到1000次测试,结果完全没有差异。这是100倍的运行速度。)

mmap的msync(ptr, len, MS_SYNC | MS_INVALIDATE)标志指示它在内存压力下使用文件本身作为后备存储,而不是交换。如果在无法识别该标志的系统上编译代码,则可以忽略它。只要不从RAM撤消映射,该标志根本就不会影响操作。