我正在开发一个使用ach_ipc通过共享内存进行进程间通信的应用程序。当本机运行该应用程序时,它运行良好,但我们希望对每个不同的进程使用一个容器的Docker(在部署时,每个进程都在其自己的物理机上运行,因此对每个容器使用一个容器是实际目标的最佳近似方法在系统外观上,共享内存实现用于模拟物理机之间的通信网络之一,这样做时我们会遇到系统调用错误。我写了一个最小的程序来复制这个问题:
#include <iostream>
#include <thread>
#include <ach.h>
int main(int argc, char** argv)
{
ach_create_attr_t attributes;
ach_create_attr_init(&attributes);
auto status = ach_create_attr_set_map(&attributes, ACH_MAP_USER);
if (status != ACH_OK)
{
std::cerr << "Create Attr Error: " << ach_result_to_string(status) << std::endl;
return status;
}
int message[2];
status = ach_create("test_channel", 1024*1024, sizeof(message), &attributes);
if (status != ACH_OK && status != ACH_EEXIST)
{
std::cerr << "Create Error: " << ach_result_to_string(status) << std::endl;
return status;
}
ach_channel channel;
ach_attr_t open_attributes;
ach_attr_init(&open_attributes);
status = ach_open(&channel, "test_channel", &open_attributes);
if (status != ACH_OK)
{
std::cerr << "Open Error: " << ach_result_to_string(status) << std::endl;
return status;
}
status = ach_flush(&channel);
if (status != ACH_OK)
{
std::cerr << "Flush Error: " << ach_result_to_string(status) << std::endl;
return status;
}
int id = std::atoi(argv[1]);
message[0] = id;
message[1] = 0;
while (true)
{
message[1] += 1;
status = ach_put(&channel, &message, sizeof(message));
if (status != ACH_OK)
{
std::cerr << "Put Error: " << ach_result_to_string(status) << std::endl;
return status;
}
int recv[2];
size_t frame_size;
do
{
status = ach_get(&channel, &recv, sizeof(recv), &frame_size, nullptr, ACH_O_WAIT | ACH_O_FIRST);
if (status != ACH_OK)
{
if (status == ACH_MISSED_FRAME)
{
continue;
}
if (status == ACH_TIMEOUT)
{
continue;
}
std::cerr << "Get Error: " << ach_result_to_string(status) << std::endl;
return status;
}
if (frame_size != sizeof(recv))
{
return 74;
}
} while (recv[0] == id);
std::cout << "Got from ID " << recv[0] << " with Data " << recv[1] << " with Length " << frame_size << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
该程序可以在Docker外部完美运行,具有两个彼此通信的实例,但是将它们包装在Docker映像中,它们将运行一段时间,然后再失败,输出如下所示:
第一个实例:
...messages snipped...
Got from ID 7 with Data 43 with Length 8
Got from ID 7 with Data 44 with Length 8
Got from ID 7 with Data 45 with Length 8
Got from ID 7 with Data 46 with Length 8
Got from ID 7 with Data 47 with Length 8
Got from ID 7 with Data 48 with Length 8
Got from ID 7 with Data 49 with Length 8
Got from ID 7 with Data 50 with Length 8
ach_pingpong: ../nptl/pthread_mutex_lock.c:352: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.
第二个实例:
...messages snipped...
Got from ID 5 with Data 44 with Length 8
Got from ID 5 with Data 45 with Length 8
Got from ID 5 with Data 46 with Length 8
Got from ID 5 with Data 47 with Length 8
Got from ID 5 with Data 48 with Length 8
Got from ID 5 with Data 49 with Length 8
Got from ID 5 with Data 50 with Length 8
Got from ID 5 with Data 51 with Length 8
我们确保为共享内存设置一个合适的大小(大约1GB),并在docker run中为IPC尝试不同的命名空间变体,包括在一个容器连接到另一个容器以及一个普通主机命名空间的共享空间。行为是相同的。
寻找“ INTERNAL_SYSCALL_ERRNO(e,__err)!= ESRCH”时,我几乎找不到任何东西,这里有什么我想念的吗?