在c中记录文本文件中的每个单词

时间:2016-02-12 14:15:26

标签: c string

我正在尝试构建一个函数,用于检查单词是否在单词列表中,如果是,则会增加该单词频率的相应计数器。否则,它将创建一个副本 单词并将其附加到列表中。然后将相应的频率计数器设置为1。 我没有编译器错误但是当我尝试打印任何单词的频率时,我得到了2百万的数字,我不知道为什么。 我收到了一个我无法修改的主文件:

#include <stdlib.h>
#include <string.h>
#define MAX_WORDS 300
#define LINE_LEN 80

void increment_word_freq(char *freq_words[MAX_WORDS], int *frequency, int *n, char *word);

int main(){
    char delim[] = " ,.!-;\"\n";
    char filename[] = "cookbook.txt";
    char line[LINE_LEN];
    char *word;
    char *freq_words[MAX_WORDS]; // a list of frequent words
    int frequency[MAX_WORDS]; // frequency of the words
    int n = 0; // number of words in the list
    int min_occr;
    FILE *fp;
    fp = fopen(filename, "r");
    if(!fp){
        printf("Could not open file %s\n", filename);
        exit(1);
    }

    // read one line at a time
    while(fgets(line, LINE_LEN, fp)){
        // get the words from the line
        word = strtok(line, delim);
        while(word != NULL) {
            // convert the word to lowercase
            int i;
            for(i = 0; i < strlen(word); i++)
                word[i] = tolower(word[i]);
            increment_word_freq(freq_words, frequency, &n, word);
            word = strtok(NULL,delim);
         }
    }
}

这是我试图使用的功能:

void increment_word_freq(char *freq_words[MAX_WORDS], int *frequency, int *n, char *word){

for(int i=0; i<MAX_WORDS; i++){
    if(freq_words[i] == word){
        frequency[i]++;
        break;
    }
    else if(i=MAX_WORDS-1){
        frequency[i]= *word;
        *n++;
    }
}
}
像我之前说过的那样,没有编译器错误但是试图打印任何单词的频率会给出2百万的数字,我不知道为什么。 非常感谢任何和所有的帮助和建议!

1 个答案:

答案 0 :(得分:0)

freq_words[i] == word仅将pionter freq_words[i]与指针word进行比较。你必须使用指针所指的字符串。将代码更改为strcmp(freq_words[i], word) == 0。除此之外,你必须分配动态内存来stroe你的字符串。使用strcpy将字符串复制到动态内存中。您必须这样做,因为wordchar中指向line的指针,但如果您读取文件的下一行,line将被覆盖。像这样调整你的代码:

#include <string.h> // strcmp, strcpy

void increment_word_freq( char *freq_words[MAX_WORDS], int *frequency, int *n, char *word)
{
    for ( int i=0; i < *n; i++) // for all current members of freq_words
    {
        if ( strcmp( freq_words[i], word ) == 0 ) // test if word is member of freq_words
        {
            frequency[i]++; // increment count
            return;         // finished, because word was found 
        }
    }

    // word was not found in freq_words => add new word to freq_words 
    if ( *n < MAX_WORDS-1 ) // test if there is one more place in freq_words
    {
       freq_words[*n] = malloc( strlen(word) + 1 );   // allocate dynamic memory for new meber of freq_words
       strcpy( freq_words[*n], word );                // copy word to freq_words[*n]
       frequency[*n] = 1;                             // int frequency[*n] with 1
       (*n)++;                                        // increment count of members of freq_words
    }
}

请注意,free末尾必须main分配内存,否则会导致内存泄漏。

for ( int i=0; i < *n; i++)
{
    free( freq_words[i] );
}