C中的标记化字符串文字数组

时间:2017-10-11 20:33:00

标签: c arrays string pointers tokenize

我正在编写一个C程序来标记输入文本文件并跟踪字长的频率,同时跟踪和存储相应的单词本身。我有一个单词count工作正常,但无法让我的word_tracker数组正确存储字符串:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
#define MAX_LENGTH 34
#define MAX_WORDS 750

int main(int argc, char *argv[]){ 

    FILE *fp; //input file
    const char *cur; //stores current word as string literal
    char words[MAX_LENGTH*MAX_WORDS]; //stores all words from text file
    char file_name[100]; //stores file name
    int word_count[MAX_LENGTH] = {0}; //array to store frequency of words based on length
    const char *word_tracker[MAX_LENGTH][MAX_WORDS]; //array to store string literals of each word, indexed by char count and 
    int char_count; //current word's char count

    printf("Enter a file name: ");
    scanf("%s", file_name);
    fp = fopen(file_name, "r"); 

    if((fp==NULL)){
        printf("Failure: missing or unopenable file");
        return -1; 
    }else{
        while(fgets(words, sizeof(words), fp)){
            cur= strtok(words, " -.,\b\t\n"); //first word of line
            char_count = strlen(cur);
            word_count[char_count-1] = word_count[char_count-1]+1; //increment frequency of specific word length
            word_tracker[char_count-1][word_count[char_count-1]-1] = cur; //store string into corresponding array index location

            /*test printing*/
            printf("%d:", char_count-1); 
            printf("%s ", word_tracker[char_count-1][(word_count[char_count-1])-1]); 

            while(cur){
                    cur = strtok(NULL, " -.,\b\t\n"); //next word
                    if(cur){
                        char_count = strlen(cur);
                        word_count[char_count-1] = word_count[char_count-1]+1; //increment frequency of specific word length
                        word_tracker[char_count-1][word_count[char_count-1]-1] = cur; //store string into corresponding array index location

                        /*test printing*/
                        printf("%d:", char_count-1); //test print
                        printf("%s ", word_tracker[char_count-1][(word_count[char_count-1])-1]); //test print

                    }
                }
            }
        }
//Testing word_tracker: (This doesn't work)
    printf("\n\n%s \n", word_tracker[0][0]);
    printf("\n%s \n", word_tracker[1][0]);
    printf("%s \n", word_tracker[2][0]);
    printf("%s \n", word_tracker[3][0]);
    printf("%s \n", word_tracker[4][0]);
    printf("%s \n", word_tracker[5][0]);

    return 0;
}

“内部”测试(在标记化循环内)运行良好,打印正确的字符串和长度。但是,主打印结束时的打印测试似乎是随机字符串,相对于输入文本文件所说的应该输入的字符串。关于我做错了什么,我有三个理论:

1)我的索引错误

2)我对如何填充和使用char *数组的理解不正确

3)我的标记化循环不正确(cur不等于“隔离的字符串”?)

我注意到在主显示变量末尾的测试是在输入文件的最后一行写的,所以我认为我的标记化循环可能是错误的。非常感谢任何指导,谢谢!

1 个答案:

答案 0 :(得分:0)

您的结果数组目前是const char *word_tracker[MAX_LENGTH][MAX_WORDS],它是指针的2D数组。 您可以(a)使用一维指针数组,然后为找到的每个单词分配内存,或者(b)在正确的位置使用二维字符数组和strcpy每个单词。

所以(a)看起来像......

const char *word_tracker[MAX_WORDS];
...
word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS] = strdup(cur);

而且(b)看起来像

char word_tracker[MAX_WORDS][MAX_LENGTH];
...
strncpy(word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS], cur, MAX_LENGTH);
word_tracker[someIndexWithSomeMeaningUpToMAX_WORDS][MAX_LENGTH-1] = '\0'

注意,在(b)中,MAX_LENGTH表示字符串的最大长度(即单个单词),因此是第二个索引。 strncpy确保不超过为单词保留的大小。