C strtok正在创建一个无法读取大小1无法释放(令牌)

时间:2017-01-30 02:55:46

标签: c valgrind free strtok

试图调试这个简单的c程序:

#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define MAX_WORD_SIZE 60
int wordCnt = 0;

int main(void){

//open dictionary 
FILE *ptr = fopen("large", "r");
if(ptr == NULL){
  printf("unable to open %s","large");
}

//get file size 
int fileSize;
fseek(ptr, 0 , SEEK_END);
fileSize=ftell(ptr) ;

//get memory for file buffer (read in whole file at once, faster) 
char * buffer = malloc(sizeof(char)*fileSize);

//rewind and read in file
fseek(ptr, 0 , SEEK_SET);
fread(buffer, fileSize, 1, ptr);

//get memory for longest word
char * token = malloc(sizeof(char)*MAX_WORD_SIZE);

//这是造成问题的部分

while (token != NULL)
{
    if(wordCnt == 0)token = strtok(buffer, "\r\n");
    else token = strtok(NULL, "\r\n");

    wordCnt++;
}
wordCnt--;    

fclose(ptr);
free(token);
free(buffer);
}

以下是来自valgrind的错误消息:

valgrind ./test
==16233== Memcheck, a memory error detector
==16233== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==16233== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==16233== Command: ./test
==16233== 
==16233== Invalid read of size 1
==16233==    at 0x5E4496C: strtok (strtok.S:137)
==16233==    by 0x42D848: main (test.c:43)
==16233==  Address 0x62dd8bc is 0 bytes after a block of size 1,439,228 alloc'd
==16233==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16233==    by 0x42D728: main (test.c:29)
==16233== 
==16233== Invalid read of size 1
==16233==    at 0x5E4499C: strtok (strtok.S:163)
==16233==    by 0x42D848: main (test.c:43)
==16233==  Address 0x62dd8bc is 0 bytes after a block of size 1,439,228 alloc'd
==16233==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16233==    by 0x42D728: main (test.c:29)
==16233== 
==16233== 
==16233== HEAP SUMMARY:
==16233==     in use at exit: 60 bytes in 1 blocks
==16233==   total heap usage: 3 allocs, 2 frees, 1,439,856 bytes allocated
==16233== 
==16233== 60 bytes in 1 blocks are definitely lost in loss record 1 of 1
==16233==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16233==    by 0x42D7B1: main (test.c:36)
==16233== 
==16233== LEAK SUMMARY:
==16233==    definitely lost: 60 bytes in 1 blocks
==16233==    indirectly lost: 0 bytes in 0 blocks
==16233==      possibly lost: 0 bytes in 0 blocks
==16233==    still reachable: 0 bytes in 0 blocks
==16233==         suppressed: 0 bytes in 0 blocks
==16233== 
==16233== For counts of detected and suppressed errors, rerun with: -v
==16233== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

2 个答案:

答案 0 :(得分:5)

strtok函数在初始调用提供的缓冲区中间返回一个指针,不应该用该指针调用free

报告的内存泄漏是因为您分配内存并使token 最初 指向该内存。然后在标记化循环中,使token指向buffer内的内存。

使用strtok的典型循环类似于

char *token = strtok(buffer, "\r\n");
while (token != NULL)
{
    ++wordCnt;
    token = strtok(NULL, "\r\n");
}

让我们说缓冲区包含字符串"Hello\nWorld"

在内存中它看起来像

+--------+     +---+---+---+---+---+----+---+---+---+---+---+----+
| buffer | --> | H | e | l | l | o | \n | W | o | r | l | d | \0 |
+--------+     +---+---+---+---+---+----+---+---+---+---+---+----+

完成后

char *token = strtok(buffer, "\r\n");

然后你有类似

的东西
+--------+     +---+---+---+---+---+----+---+---+---+---+---+----+
| buffer | --> | H | e | l | l | o | \n | W | o | r | l | d | \0 |
+--------+     +---+---+---+---+---+----+---+---+---+---+---+----+
                                        ^
+-------+                               |
| token | ------------------------------/
+-------+

也就是说,token指向换行符之后的位置(“word”"World"的开头),但它位于为buffer分配的内存中。< / p>

还有另一个问题:strtok函数需要将您标记为的字符串作为实际的空终止字符串。你的不是,这就是造成“无效读取”错误的原因,因为strtok超出为buffer分配的内存范围。

您需要为buffer再分配一个字节,并将最后一个字节初始化为'\0'以使其终止:

char * buffer = malloc(fileSize + 1);  // +1 for string terminator

// Read...

buffer[fileSize] = '\0';  // Terminate strings

请注意,我不会与sizeof(char)相乘,因为它在规范中定义为总是等于1

答案 1 :(得分:1)

作为Some programmer dude's解释的后续内容,您可能会问为什么通过将free()指针传递给它的任何部分来释放字符串。自由函数无法迭代到分配块的开头或结尾吗?

好吧,malloc以块的形式分配数据。每个块都有一个标题,用于跟踪分配的大小。为了能够释放该块,每个free()必须能够访问头,以某种方式将其标记为空闲,可能通过将长度设置为0,可能通过将整个头归零,这取决于实现。

问题是,free()根据传递给它的指针假定标题的位置。它不能从其余数据中挑出标题。它必须知道它与指针的关系。

为了演示,让我们做一个假装的,琐碎的内存分配器。

typedef struct s_memory_block {
    int Size;
    char Memory[1];
} memory_block;

char *AllocateMemory(size_t Size)
{
    memory_block *Block;
    Block = SomeOperatingSystemMemoryAllocator(sizeof(int) + sizeof(char) * Size);
    Block->Size = Size;
    return &Block->Memory[0];
}

void FreeMemory(char *Memory)
{
    memory_block *Block;
    Block = Memory - sizeof(int); // assume the header is right in front of the pointer
    Block->Length = 0;
}

显然这是一个愚蠢而琐碎的例子,但它可能会帮助你理解。在返回指针之前存储簿记内存可以通过各种方式非常有用。以Sean Barret的stb.h stretchy buffers为例。