Question

我正在尝试将文件中的唯一字符串写入链接列表，并为每个重复的字增加计数。我想使用getNextWord函数返回指向文件中下一个单词的指针。问题是我对c和指针很新，所以我实际上不知道在我的main方法中要做什么来调用getNextWord以及如何使用这个字符串指针来实际访问它所指向的字符串。那么如何使用我的函数来获取需要作为节点密钥的字符串呢？此外，任何其他建议将不胜感激，如果您发现我的功能或结构有任何问题，请告诉我！非常感谢您的宝贵时间。这是我的功能和结构..

#define MAX_WORD_LEN 256    

struct list {
    int count;
    char string[MAX_WORD_LEN];
    struct list *next;
};

char* getNextWord(FILE* fd) {
    char c;
    char wordBuffer[MAX_WORD_LEN];
    int putChar = 0;

    while((c = fgetc(fd)) != EOF) {
        if(isalnum(c)) break;
    }
    if (c == EOF) return NULL;

    wordBuffer[putChar++] = tolower(c);

    while((c = fgetc(fd)) != EOF) {
        if(isspace(c) || putChar >= MAX_WORD_LEN -1) break;

        if(isalnum(c)) {
            wordBuffer[putChar++] = tolower(c);
        }
    }
    wordBuffer[putChar] = '\0';
    return strdup(wordBuffer);
}

Answer 1

我看到列表节点定义与从文件中检索单词的函数之间存在小的差异。

在C中，您完全负责内存分配，因此您必须决定谁将分配数据以及谁将释放它们。

这里你的文件解析功能进行分配。这意味着getNextWord返回的指针将引用必须在某个时刻释放的动态内存。

同时，您的节点结构包含另一个用于表示同一条数据的内存缓冲区。

使用当前的实现，您必须将通过getNextWord获取的字符串复制到您的节点中，然后释放字符串，如下所示：

char * new_word = getNextWord (file);
strcpy (my_node->string, new_word);
free (new_word );

这是浪费时间和资源：每个字符都被复制两次（一次在getNextWord内，另一次在strcpy内）。

为了避免重复，你基本上可以做两件事。

1）将string字段更改为char *以保留对getNextWord结果的引用

struct list {
    int count;
    char * string;   // <- reference to the string allocated by getNextWord
    struct list *next;
};

// populating node
my_node->string = getNextWord (file);

在这种情况下，每个节点都有责任最终释放字符串。

2）将string保留为char缓冲区并直接填充getNextWord

typedef char wordBuffer_t[MAX_WORD_LEN];

struct list {
    int          count;
    wordBuffer_t string; // <- storage for the word inside each node
    struct list *next;
};

char* getNextWord(FILE* fd, wordBuffer_t wordBuffer) {
    // rest of the code does not change

// populating node
getNextWord (file, my_node->string);

在这种情况下，不需要动态分配。另一方面，每个节点都需要为最大可能的字符串分配足够的空间，这在内存消耗方面效率低下。

作为一个重要的注意事项，我建议您在使用任何可能用作函数参数的数组时系统地使用typedef（此处wordBuffer_t传递给getNextWord）。
它将使您的代码更具可读性，并使您免于this common C pitfall

Answer 2

不是完整的答案 - 但是strtok是用于此类事情的好功能。您需要将要解析的整个字符串加载到内存中（而不是一次从文件中读取一个字符），但之后基本上会分配所有内存。以下是让您前进的方法：

char myString[BUFLEN];
char *nextWord;
fgets(myString, BUFLEN, stdin); // if you get from file, change this accordingly 

nextWord = strtok(myString, " ;,.:\t\n");  // use whatever delimiters you expect
printf("The first word is %s\n", nextWord); // strtok returns pointer to start of match
strncpy (my_node->string, nextWord, MAX_WORD_LEN-1); // copy the word to your structure
my_node->string[MAX_WORD_LEN-1]='\0';  // in case word was too long - prevent trouble

while((nextWord = strtok(NULL, " .,;:\t\n")) != NULL) {
  printf("The next word is %s\n", nextWord);
  strncpy (other_node->string, nextWord, MAX_WORD_LEN-1); // copy the word to another node in your structure
  other_node->string[MAX_WORD_LEN-1]='\0';  // in case word was too long - prevent trouble
  // … do whatever else to manage your list
}

这只是为了向您展示如何使用strtok从字符串中提取单词。有趣的是，您只需使用字符串的地址调用strtok 一次 - 之后您使用NULL（它实际上会跟踪字符串中“吃”的位置）。注意 - 在整理字符串的过程中，会插入'\0'个字符...这意味着在通过strtok运行字符串后，您无法重新使用该字符串。但是否则它对你的目的来说是一个有用的功能。

如何使用字符串指针访问字符串的其余部分？

2 个答案: