我对使用内存映射IO的前景感兴趣,最好是 利用boost :: interprocess中的工具来实现跨平台 支持,将文件中不连续的系统页面大小的块映射到 内存中连续的地址空间。
简化的具体方案:
我有许多“普通旧数据”结构,每个都是固定长度的 (小于系统页面大小。)这些结构是连接的 进入一个(非常长的)流类型&结构的位置 由那些在其中进行的结构的价值决定 流。我的目标是最大限度地减少延迟并最大限度地提高吞吐量 苛刻的并发环境。
我可以通过内存映射非常有效地读取这些数据 系统页面大小至少两倍......并建立一个新的 映射立即读取超出的结构 倒数第二个系统页面边界。这允许交互的代码 用普通的数据结构幸福地不知道这些 结构是内存映射的...例如,可以比较两个 直接使用memcmp()的不同结构,无需关心 关于页面边界。
事情变得有趣的是更新这些数据 溪流......当他们被(同时)阅读时。我的策略 喜欢使用的灵感来自系统页面大小的'Copy On Write' 粒度...基本上写'覆盖页' - 允许一个 读取旧数据的过程,而另一个读取更新的数据。
管理要使用的覆盖页面,以及何时,不一定 琐碎......这不是我主要担心的问题。我主要担心的是我可能 有一个跨越第4和第5页的结构,然后更新一个 结构完全包含在第5页...写入新页面 位置6 ...当它是“垃圾收集”时,将第5页留下 决定不再可达。这意味着,如果我映射页面 4进入位置M,我需要将第6页映射到内存位置 M + page_size ...以便能够可靠地处理结构 使用现有(非内存映射感知)函数跨页边界。
我正在努力建立最好的策略,而且我受到了阻碍 文件我觉得不完整。基本上,我需要解耦 从内存映射到该地址的地址空间分配 空间。使用mmap(),我知道我可以使用MAP_FIXED - 如果我愿意的话 明确控制映射位置......但我不清楚我是怎么做的 应该保留地址空间以便安全地执行此操作。我可以映射吗? / dev / zero为没有MAP_FIXED的两个页面,然后使用MAP_FIXED两次 将两个页面映射到显式VM地址的分配空间?如果 所以,我应该三次打电话给munmap()吗?它会泄漏资源吗? 和/或有任何其他不愉快的开销?更进一步地解决这个问题 复杂,我想在Windows上有类似的行为......有什么办法 去做这个?如果我要妥协我的话,是否有完整的解决方案? 跨平台的野心?
-
感谢您的回答,Mahmoud ......我已经读过了,并且认为我已经理解了代码......我已经在Linux下编译了它,它的行为与您的建议相符。
我主要关心的是第62行 - 使用MAP_FIXED。它对mmap做了一些假设,当我阅读我能找到的文档时,我无法确认。您将“更新”页面映射到与最初返回的mmap()相同的地址空间 - 我认为这是'正确' - 即不是恰好在Linux上运行的东西?我还需要假设它适用于文件映射和匿名映射的跨平台。
这个样本肯定让我前进......记录我最终需要的东西可能是在Linux上用mmap()实现的 - 至少。我真正喜欢的是指向文档的指针,该文档显示MAP_FIXED行将作为示例演示...并且,理想情况下,从Linux / Unix特定mmap()到独立平台的转换(Boost :: interprocess)方法。
答案 0 :(得分:7)
你的问题有点令人困惑。根据我的理解,这段代码将满足您的需求:
#define PAGESIZE 4096
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <errno.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
struct StoredObject
{
int IntVal;
char StrVal[25];
};
int main(int argc, char **argv)
{
int fd = open("mmapfile", O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600);
//Set the file to the size of our data (2 pages)
lseek(fd, PAGESIZE*2 - 1, SEEK_SET);
write(fd, "", 1); //The final byte
unsigned char *mapPtr = (unsigned char *) mmap(0, PAGESIZE * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
struct StoredObject controlObject;
controlObject.IntVal = 12;
strcpy(controlObject.StrVal, "Mary had a little lamb.\n");
struct StoredObject *mary1;
mary1 = (struct StoredObject *)(mapPtr + PAGESIZE - 4); //Will fall on the boundary between first and second page
memcpy(mary1, &controlObject, sizeof(StoredObject));
printf("%d, %s", mary1->IntVal, mary1->StrVal);
//Should print "12, Mary had a little lamb.\n"
struct StoredObject *john1;
john1 = mary1 + 1; //Comes immediately after mary1 in memory; will start and end in the second page
memcpy(john1, &controlObject, sizeof(StoredObject));
john1->IntVal = 42;
strcpy(john1->StrVal, "John had a little lamb.\n");
printf("%d, %s", john1->IntVal, john1->StrVal);
//Should print "12, Mary had a little lamb.\n"
//Make sure the data's on the disk, as this is the initial, "read-only" data
msync(mapPtr, PAGESIZE * 2, MS_SYNC);
//This is the inital data set, now in memory, loaded across two pages
//At this point, someone could be reading from there. We don't know or care.
//We want to modify john1, but don't want to write over the existing data
//Easy as pie.
//This is the shadow map. COW-like optimization will take place:
//we'll map the entire address space from the shared source, then overlap with a new map to modify
//This is mapped anywhere, letting the system decide what address we'll be using for the new data pointer
unsigned char *mapPtr2 = (unsigned char *) mmap(0, PAGESIZE * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
//Map the second page on top of the first mapping; this is the one that we're modifying. It is *not* backed by disk
unsigned char *temp = (unsigned char *) mmap(mapPtr2 + PAGESIZE, PAGESIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_FIXED | MAP_ANON, 0, 0);
if (temp == MAP_FAILED)
{
printf("Fixed map failed. %s", strerror(errno));
}
assert(temp == mapPtr2 + PAGESIZE);
//Make a copy of the old data that will later be changed
memcpy(mapPtr2 + PAGESIZE, mapPtr + PAGESIZE, PAGESIZE);
//The two address spaces should still be identical until this point
assert(memcmp(mapPtr, mapPtr2, PAGESIZE * 2) == 0);
//We can now make our changes to the second page as needed
struct StoredObject *mary2 = (struct StoredObject *)(((unsigned char *)mary1 - mapPtr) + mapPtr2);
struct StoredObject *john2 = (struct StoredObject *)(((unsigned char *)john1 - mapPtr) + mapPtr2);
john2->IntVal = 52;
strcpy(john2->StrVal, "Mike had a little lamb.\n");
//Test that everything worked OK
assert(memcmp(mary1, mary2, sizeof(struct StoredObject)) == 0);
printf("%d, %s", john2->IntVal, john2->StrVal);
//Should print "52, Mike had a little lamb.\n"
//Now assume our garbage collection routine has detected that no one is using the original copy of the data
munmap(mapPtr, PAGESIZE * 2);
mapPtr = mapPtr2;
//Now we're done with all our work and want to completely clean up
munmap(mapPtr2, PAGESIZE * 2);
close(fd);
return 0;
}
我修改后的答案应该解决您的安全问题。
仅在第二次MAP_FIXED
电话上使用mmap
(就像我上面的那样)。关于MAP_FIXED
的一个很酷的事情是,它允许您覆盖现有的mmap
地址部分。它将卸载您重叠的范围,并将其替换为新的映射内容:
MAP_FIXED
[...] If the memory
region specified by addr and len overlaps pages of any existing
mapping(s), then the overlapped part of the existing mapping(s) will be
discarded. [...]
这样,你让操作系统负责为你找到数百兆的连续内存块(永远不要在你不确定的地址上调用MAP_FIXED
不可用)。然后使用您将要修改的数据在现在映射的巨大空间的子部分上调用MAP_FIXED
。多田。
在Windows上,这样的东西应该可行(我现在在Mac上,所以未经测试):
int main(int argc, char **argv)
{
HANDLE hFile = CreateFile(L"mmapfile", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
//Set the file to the size of our data (2 pages)
SetFilePointer(hFile, PAGESIZE*2 - 1, 0, FILE_BEGIN);
DWORD bytesWritten = -1;
WriteFile(hFile, "", 1, &bytesWritten, NULL);
HANDLE hMap = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, PAGESIZE * 2, NULL);
unsigned char *mapPtr = (unsigned char *) MapViewOfFile(hMap, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE * 2);
struct StoredObject controlObject;
controlObject.IntVal = 12;
strcpy(controlObject.StrVal, "Mary had a little lamb.\n");
struct StoredObject *mary1;
mary1 = (struct StoredObject *)(mapPtr + PAGESIZE - 4); //Will fall on the boundary between first and second page
memcpy(mary1, &controlObject, sizeof(StoredObject));
printf("%d, %s", mary1->IntVal, mary1->StrVal);
//Should print "12, Mary had a little lamb.\n"
struct StoredObject *john1;
john1 = mary1 + 1; //Comes immediately after mary1 in memory; will start and end in the second page
memcpy(john1, &controlObject, sizeof(StoredObject));
john1->IntVal = 42;
strcpy(john1->StrVal, "John had a little lamb.\n");
printf("%d, %s", john1->IntVal, john1->StrVal);
//Should print "12, Mary had a little lamb.\n"
//Make sure the data's on the disk, as this is the initial, "read-only" data
//msync(mapPtr, PAGESIZE * 2, MS_SYNC);
//This is the inital data set, now in memory, loaded across two pages
//At this point, someone could be reading from there. We don't know or care.
//We want to modify john1, but don't want to write over the existing data
//Easy as pie.
//This is the shadow map. COW-like optimization will take place:
//we'll map the entire address space from the shared source, then overlap with a new map to modify
//This is mapped anywhere, letting the system decide what address we'll be using for the new data pointer
unsigned char *reservedMem = (unsigned char *) VirtualAlloc(NULL, PAGESIZE * 2, MEM_RESERVE, PAGE_READWRITE);
HANDLE hMap2 = CreateFileMapping(hFile, NULL, PAGE_READWRITE, 0, PAGESIZE, NULL);
unsigned char *mapPtr2 = (unsigned char *) MapViewOfFileEx(hMap2, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE, reservedMem);
//Map the second page on top of the first mapping; this is the one that we're modifying. It is *not* backed by disk
unsigned char *temp = (unsigned char *) MapViewOfFileEx(hMap2, FILE_MAP_READ | FILE_MAP_WRITE, 0, 0, PAGESIZE, reservedMem + PAGESIZE);
if (temp == NULL)
{
printf("Fixed map failed. 0x%x\n", GetLastError());
return -1;
}
assert(temp == mapPtr2 + PAGESIZE);
//Make a copy of the old data that will later be changed
memcpy(mapPtr2 + PAGESIZE, mapPtr + PAGESIZE, PAGESIZE);
//The two address spaces should still be identical until this point
assert(memcmp(mapPtr, mapPtr2, PAGESIZE * 2) == 0);
//We can now make our changes to the second page as needed
struct StoredObject *mary2 = (struct StoredObject *)(((unsigned char *)mary1 - mapPtr) + mapPtr2);
struct StoredObject *john2 = (struct StoredObject *)(((unsigned char *)john1 - mapPtr) + mapPtr2);
john2->IntVal = 52;
strcpy(john2->StrVal, "Mike had a little lamb.\n");
//Test that everything worked OK
assert(memcmp(mary1, mary2, sizeof(struct StoredObject)) == 0);
printf("%d, %s", john2->IntVal, john2->StrVal);
//Should print "52, Mike had a little lamb.\n"
//Now assume our garbage collection routine has detected that no one is using the original copy of the data
//munmap(mapPtr, PAGESIZE * 2);
mapPtr = mapPtr2;
//Now we're done with all our work and want to completely clean up
//munmap(mapPtr2, PAGESIZE * 2);
//close(fd);
return 0;
}
答案 1 :(得分:3)
但我不清楚如何安全地保留地址空间
这会因操作系统的不同而有所不同,但是对mmap的msdn(我在msdn搜索中以“xp mmap”开头)进行了一些挖掘,表明微软拥有通常的VerboseAndHelpfullyCapitalizedNames来实现mmap的各种功能。文件和匿名映射器都可以处理与任何POSIX-2001系统相同的固定地址请求,即如果地址空间中的其他内容正在与内核通信,则可以对其进行排序。我不打算“安全地”触摸,没有“安全”的代码,你想要移植到未指定的平台。您将不得不构建自己的预映射匿名内存池,您可以在自己的控制下取消映射和分区。
答案 2 :(得分:1)
我测试了来自@Mahmoud的windows代码,实际上我测试了以下类似的代码,但它不起作用(Linux代码可以正常工作。)如果取消注释VirtualFree,它将起作用。正如我在上面的评论中所提到的,在Windows上你可以使用VirtualAlloc保留地址空间,但是你不能将MapViewOfFileEx与已映射的地址一起使用,所以你需要首先使用VirtualFree。然后是一个竞争条件,其中另一个线程可以在你做之前获取内存地址,所以你必须在循环中做所有事情,例如尝试多达1000次,然后放弃。
package main
import (
"fmt"
"os"
"syscall"
)
func main() {
const size = 1024 * 1024
file, err := os.Create("foo.dat")
if err != nil {
panic(err)
}
if err := file.Truncate(size); err != nil {
panic(err)
}
const MEM_COMMIT = 0x1000
addr, err := virtualAlloc(0, size, MEM_COMMIT, protReadWrite)
if err != nil {
panic(err)
}
fd, err := syscall.CreateFileMapping(
syscall.Handle(file.Fd()),
nil,
uint32(protReadWrite),
0,
uint32(size),
nil,
)
//if err := virtualFree(addr); err != nil {
// panic(err)
//}
base, err := mapViewOfFileEx(fd, syscall.FILE_MAP_READ|syscall.FILE_MAP_WRITE, 0, 0, size, addr)
if base == 0 {
panic("mapViewOfFileEx returned 0")
}
if err != nil {
panic(err)
}
fmt.Println("success!")
}
type memProtect uint32
const (
protReadOnly memProtect = 0x02
protReadWrite memProtect = 0x04
protExecute memProtect = 0x20
protAll memProtect = 0x40
)
var (
modkernel32 = syscall.MustLoadDLL("kernel32.dll")
procMapViewOfFileEx = modkernel32.MustFindProc("MapViewOfFileEx")
procVirtualAlloc = modkernel32.MustFindProc("VirtualAlloc")
procVirtualFree = modkernel32.MustFindProc("VirtualFree")
procVirtualProtect = modkernel32.MustFindProc("VirtualProtect")
)
func mapViewOfFileEx(handle syscall.Handle, prot memProtect, offsetHigh uint32, offsetLow uint32, length uintptr, target uintptr) (addr uintptr, err error) {
r0, _, e1 := syscall.Syscall6(procMapViewOfFileEx.Addr(), 6, uintptr(handle), uintptr(prot), uintptr(offsetHigh), uintptr(offsetLow), length, target)
addr = uintptr(r0)
if addr == 0 {
if e1 != 0 {
err = error(e1)
} else {
err = syscall.EINVAL
}
}
return addr, nil
}
func virtualAlloc(addr, size uintptr, allocType uint32, prot memProtect) (mem uintptr, err error) {
r0, _, e1 := syscall.Syscall6(procVirtualAlloc.Addr(), 4, addr, size, uintptr(allocType), uintptr(prot), 0, 0)
mem = uintptr(r0)
if e1 != 0 {
return 0, error(e1)
}
return mem, nil
}
func virtualFree(addr uintptr) error {
const MEM_RELEASE = 0x8000
_, _, e1 := syscall.Syscall(procVirtualFree.Addr(), 3, addr, 0, MEM_RELEASE)
if e1 != 0 {
return error(e1)
}
return nil
}