我在一个进程中有2个线程。 一个mallocs并将数据包写入全局链表。 另一个继续从全局链表中读取数据包,通过硬件调用将它们发送出去,然后释放内存。 这段代码以很高的速率处理大量的数据包。
一切正常,除了这个孤立的案例,由于似乎是一个失败的malloc,该过程中止了。 这很奇怪,因为malloc的手册页说如果malloc失败,它只返回NULL。使用malloc()会有任何其他可能的失败,这可能会导致进程崩溃,就像我的情况一样吗?
这是来自gdb的回溯 -
#0 0xffffe430 in __kernel_vsyscall ()
No symbol table info available.
#1 0xf757cc10 in raise () from /lib/libc.so.6 No symbol table info available.
#2 0xf757e545 in abort () from /lib/libc.so.6 No symbol table info available.
#3 0xf75b94e5 in __libc_message () from /lib/libc.so.6 No symbol table info available.
#4 0xf75bf3d4 in malloc_printerr () from /lib/libc.so.6 No symbol table info available.
#5 0xf75c1f5a in _int_malloc () from /lib/libc.so.6 No symbol table info available.
#6 0xf75c3dd4 in malloc () from /lib/libc.so.6 No symbol table info available.
#7 0x080a2466 in np_enqueue_packet_to_tx_queue (prio=2, pkt_type=1 '\001', tx_host_handle=162533812, packet_length=40,
pTxData=0x14dfa694 "", dlci=474, vfport=71369178) at ./np_tx.c:173 No locals.
以下是发送方线程的代码,其malloc失败。 发送方线程mallocs内存(由互斥锁保护的操作)并写入全局队列(也受互斥锁保护)。 当核心转储发生时,从gdb我可以看到第一个malloc成功,第二个失败并导致核心转储。
void np_enqueue_packet_to_tx_queue(int prio, WP_U8 pkt_type,
WP_handle tx_host_handle,
WP_S32 packet_length, WP_CHAR *pTxData,
WP_U32 dlci, WP_U32 vfport)
{
STRU_TX_QUEUE_NODE *packetToSend;
packetToSend = malloc(sizeof(STRU_TX_QUEUE_NODE));
if (packetToSend == NULL)
{
WDDI_ERR(" Cannot allocate new memory in np_enqueue_packet_to_tx_queue\n");
return;
}
memset(packetToSend, 0, sizeof(STRU_TX_QUEUE_NODE));
packetToSend->packet = (WP_CHAR*)malloc(packet_length);
if (packetToSend->packet == NULL)
{
WDDI_ERR(" Cannot allocate new memory in np_enqueue_packet_to_tx_queue\n");
free(packetToSend);
packetToSend = NULL;
return;
}
memset(packetToSend->packet, 0, packet_length);
packetToSend->pkt_type = pkt_type;
packetToSend->packet_length = packet_length;
memcpy(packetToSend->packet, pTxData, packet_length);
if (pkt_type == PACKET_TYPE_FR)
{
packetToSend->fr_tx_info.tx_host_handle = tx_host_handle;
packetToSend->fr_tx_info.dlci = dlci;
packetToSend->fr_tx_info.vfport = vfport;
}
pthread_mutex_lock(&tx_queue_mutex);
if (prio == PRIO_HIGH)
{
write_packet_to_tx_queue(&high_prio_tx_queue_g, packetToSend);
}
else
{
write_packet_to_tx_queue(&low_prio_tx_queue_g, packetToSend);
}
pthread_mutex_unlock(&tx_queue_mutex);
// wakeup Tx thread
pthread_cond_signal(&tx_queue_cond);
}
有人可以帮助指出这里可能发生的错误吗?
这是读者线程的代码。它从全局队列中读取一些数据(由互斥锁保护的操作),释放互斥锁,对数据进行一些处理,然后释放数据的内存(此操作不受互斥锁保护)。
void *tx_thread(void *arg)
{
STRU_TX_QUEUE_NODE *pickedUpPackets[TX_NUM_PACKETS_BUFFERED];
int read_counter, send_counter;
while (1)
{
pthread_mutex_lock(&tx_queue_mutex);
while ((high_prio_tx_queue_g.len == 0) && (low_prio_tx_queue_g.len == 0))
{
pthread_cond_wait(&tx_queue_cond, &tx_queue_mutex);
}
if (high_prio_tx_queue_g.len)
{
for (read_counter = 0; read_counter < TX_NUM_PACKETS_BUFFERED; read_counter++)
{
pickedUpPackets[read_counter] = read_packet_from_tx_queue(&high_prio_tx_queue_g);
if (pickedUpPackets[read_counter] == NULL)
{
break;
}
}
}
else if (low_prio_tx_queue_g.len)
{
for (read_counter = 0; read_counter < TX_NUM_PACKETS_BUFFERED; read_counter++)
{
pickedUpPackets[read_counter] = read_packet_from_tx_queue(&low_prio_tx_queue_g);
if (pickedUpPackets[read_counter] == NULL)
{
break;
}
}
}
pthread_mutex_unlock(&tx_queue_mutex);
for (send_counter = 0; send_counter < read_counter; send_counter++)
{
np_host_send(pickedUpPackets[send_counter]);
}
}
}
void np_host_send(STRU_TX_QUEUE_NODE *packetToSend)
{
if (packetToSend == NULL)
{
return;
}
// some hardware calls
free(packetToSend->packet);
packetToSend->packet = NULL;
free(packetToSend);
packetToSend = NULL;
}