我正在实现自己的malloc版本,该版本与glibc malloc非常相似,因为它通过创建arenas支持多线程处理,arena是线程可以在其中工作而不会与另一个线程竞争的内存区域。
我的数据结构如下:
typedef struct s_arena {
pthread_mutex_t mutex;
t_pool *main_pool;
} t_arena;
typedef struct s_arena_data {
_Atomic int arena_count;
t_arena arenas[M_ARENA_MAX];
} t_arena_data;
t_arena_data是一个全局变量,其中包含已创建的竞技场数量,第一次调用从0开始,上限为M_ARENA_MAX(我目前定义为8),以及一个包含我所有竞技场的数组。
一个竞技场仅包含一个互斥锁,该互斥锁已使用pthread_mutex_init()进行了初始化,以及一个指向内存池的指针。对于本主题,内存池并不重要,因为争用条件是在达到该条件之前发生的。
我的程序如何工作:当每个线程进入malloc时,它都会尝试pthread_try_lock第一个竞技场的互斥锁。如果可以的话,一切都很好,并且可以进行我这里未描述的分配。否则,可能会发生几件事。
如果数组中的下一个条目为空且未达到M_ARENA_MAX,则将锁定新的互斥锁以创建新的竞技场并将其添加到数组中。互斥锁是全局的,这意味着没有两个线程可以同时创建一个竞技场。
如果该互斥锁已锁定,则线程将循环回到arena [0]并继续搜索打开的互斥锁。
现在,我非常确定由于变量arena_count而发生了竞争情况。我已经观察到由于debug printf语句而导致的错误,无论何时函数segfaults都没有达到M_ARENA_MAX。如果已安装,则程序不会崩溃。因此,我怀疑一个线程可能正好在另一个线程将其递增之前读取arena_count的值,并且当它完成读取它时,递增它的线程会释放new_arena_mutex,而第一个线程会使用错误的索引。
这是我的第一个多线程程序,所以对于我的解释或代码不清楚,我深表歉意,但是我已经在此问题上花费了最后4个小时,虽然我认为我已将问题缩小了范围,但我确实没有知道如何解决。
这是错误代码的一部分:
current_arena = &arena_data.arenas[0];
int arena_index = 0;
while (pthread_mutex_trylock(¤t_arena->mutex) != 0) {
printf("THREAD %p READS[0] ARENA COUNT AT %d\n", (void *)pthread_self(), arena_data.arena_count);
if (arena_index == arena_data.arena_count - 1) {
printf("THREAD %p READS[1] ARENA COUNT AT %d\n", (void *)pthread_self(), arena_data.arena_count);
if (pthread_mutex_trylock(&new_arena_mutex) != 0 || arena_data.arena_count == M_ARENA_MAX) {
current_arena = &arena_data.arenas[(arena_index = 0)];
continue;
}
creator = true;
break;
}
current_arena = &arena_data.arenas[arena_index++];
}
/* All arenas are occupied by other threads but M_ARENA_MAX isn't reached. Let's just create a new one. */
if (creator == true) {
printf("THREAD %p READS[2] ARENA COUNT AT %d\n", (void *)pthread_self(), arena_data.arena_count);
current_pool = create_new_pool(MAIN_POOL, chunk_type, size, pagesize, &new_arena_mutex);
if (current_pool == MAP_FAILED) return NULL;
++arena_data.arena_count;
arena_data.arenas[arena_index + 1] = (t_arena){ .main_pool = current_pool };
pthread_mutex_init(&arena_data.arenas[arena_index + 1].mutex, NULL);
pthread_mutex_lock(&arena_data.arenas[arena_index + 1].mutex);
pthread_mutex_unlock(&new_arena_mutex);
return user_area((t_alloc_chunk *)current_pool->chunk, size, &arena_data.arenas[arena_index + 1].mutex);
}
这是printf陈述之一,这使我的理论存在竞争条件感到欣慰:
THREAD 0x7f9c3b216700 READS[1] ARENA COUNT AT 4
THREAD 0x7f9c3b216700 READS[2] ARENA COUNT AT 5
该值应该相等,但不相等。
答案 0 :(得分:1)
我可以在您的代码中发现三个问题。
这是您在问题中描述的比赛条件:
因此,我怀疑一个线程可能正好在另一个线程将其递增之前读取arena_count的值,并且当它完成读取它时,递增它的线程将释放new_arena_mutex,并且第一个线程进入创建过程索引错误的竞技场。
是的,可能会发生。 arena_data.arena_count
的 load 是原子发生的,但是线程通常可能不会假定该值是(仍然)正确的。修改后的版本in your answer不能不能解决问题。
为解决此问题,以下保证可能会有所帮助:按住arena_data.arena_count
时发生new_arena_mutex
的任何 store 。结果,持有互斥锁的线程可以安全地加载arena_data.arena_count
(当然是在持有互斥锁的同时),并且可以确保在解锁互斥锁之前其值不会改变。让我尝试通过更改和注释您的更新代码来进行解释:
while (pthread_mutex_trylock(¤t_arena->mutex) != 0) {
if (arena_index == arena_data.arena_count - 1) {
// This thread checks the condition above _without_ holding the
// `new_arena_mutex`. Another thread may hold the mutex (and hence it
// may increment `arena_count`).
if (pthread_mutex_trylock(&new_arena_mutex) == 0) {
// Now, this thread can assume that no other thread writes to
// `arena_data.arena_count`. However, the condition
//
// arena_index == arena_data.arena_count - 1
//
// may no longer be true (because it had been checked before locking).
if (arena_data.arena_count < M_ARENA_MAX) {
// This thread may create a new arena at index
// `arena_data.arena_count`. That is safe because this thread holds
// the `new_arena_mutex` (preventing other threads from modifying
// `arena_count`.
//
// However, it is possible that `arena_index` is not at the position
// of the most recently created arena (checked _before_ locking). Let
// us just assume that all the recently created arenas are still
// locked. Hence we just skip the check and directly jump to the most
// recently created arena (as if we failed locking).
arena_index = arena_data.arena_count - 1;
current_arena = &arena_data.arenas[arena_index];
++arena_data.arena_count;
assert(
arena_index + 1 == arena_data.arena_count &&
"... and this thread is holding the mutex, so it stays true."
);
creator = true;
break;
} else {
pthread_mutex_unlock(&new_arena_mutex);
}
我认为,如果将这些操作提取到诸如以下的函数中,则代码将变得更具可读性
// both functions return `arena_index` or `-1`
int try_find_and_lock_arena();
int try_create_and_lock_arena();
以下行中的后增量运算符对我来说似乎是错误的:
current_arena = &arena_data.arenas[arena_index++];// post-increment
// now, `&arena_data.arenas[arena_index]` is one beyond `current_arena`.
用两行文字表示,可能更容易推断出该行为:
assert(
current_arena == &arena_data.arenas[arena_index] &&
"this is an invariant I expect to hold"
);
current_arena = &arena_data.arenas[arena_index];// this is a no-op...
arena_index++;// ... and now, they are out of sync
assert(
current_arena == &arena_data.arenas[arena_index] &&
"now, the invariant is broken (and this assert should fire)"
);
我发现很难为所有可能的路径匹配互斥锁的锁定/解锁操作,因为它们发生在不同的范围内。
// [new_arena_mutex is locked]
current_pool = create_new_pool(/* ... */, &new_arena_mutex);
if (current_pool == MAP_FAILED) return NULL;// error-path return
// `create_new_pool` unlocks iff it returns `MAP_FAILED`...
/* ... */
pthread_mutex_unlock(&new_arena_mutex);
// ... otherwise, the mutex is unlocked here
return user_area(/* ... */);
答案 1 :(得分:0)
(编辑):否。
这似乎已经解决了问题:
/* Look for an open arena. */
current_arena = &arena_data.arenas[0];
int arena_index = 0;
while (pthread_mutex_trylock(¤t_arena->mutex) != 0) {
if (arena_index == arena_data.arena_count - 1) {
if (pthread_mutex_trylock(&new_arena_mutex) == 0) {
if (arena_data.arena_count < M_ARENA_MAX) {
++arena_data.arena_count;
creator = true;
break;
} else {
pthread_mutex_unlock(&new_arena_mutex);
}
}
current_arena = &arena_data.arenas[(arena_index = 0)];
continue;
}
current_arena = &arena_data.arenas[arena_index++];
}
/* All arenas are occupied by other threads but M_ARENA_MAX isn't reached. Let's just create a new one. */
if (creator == true) {
current_pool = create_new_pool(MAIN_POOL, chunk_type, size, pagesize, &new_arena_mutex);
if (current_pool == MAP_FAILED) return NULL;
arena_data.arenas[arena_index + 1] = (t_arena){ .main_pool = current_pool };
pthread_mutex_init(&arena_data.arenas[arena_index + 1].mutex, NULL);
pthread_mutex_lock(&arena_data.arenas[arena_index + 1].mutex);
pthread_mutex_unlock(&new_arena_mutex);
return user_area((t_alloc_chunk *)current_pool->chunk, size, &arena_data.arenas[arena_index + 1].mutex);
}