从文件中获取字符串而不使用标点符号进行拼写检查输出原始标点符号。

时间:2012-04-04 18:32:08

标签: c string

您好我在c中制作一个拼写检查器,其中包含一个字符串数组中的字典,并使用二进制搜索在字典中查找单词。

我的问题是我正在尝试从文件中读取文本并将文本输出回新文件,其中错误的单词突出显示如下:** spellingmistake **但该文件将包含诸如。,!等字符。应该输出到新文件,但在将单词与字典进行比较时显然不存在。

所以我想要这个:

text file: "worng!"

new file: "** worng **!"

我一直试图尽我所能解决这个问题,并且已经花了很长时间在谷歌上,但我没有更接近解决方案。到目前为止,我已经编写了以下代码来读取每个字符并填充两个char数组,一个小写字母用于字典比较,一个输入用于原始单词,如果没有标点符号则可以工作但显然我在标点符号存在时以这种方式松开空格我我确定有更好的方法可以做到这一点,但我找不到它,所以任何指针都会受到赞赏。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#define MAX_STRING_SIZE 29  /*define longest non-technical word in english dictionary plus 1*/

/*function prototypes*/
int dictWordCount(FILE *ptrF);  /*counts and returns number of words in dictionary*/
void loadDictionary(char ***pArray1, FILE *ptrFile, int counter);   /*create dictionary array from file based on word count*/
void printDictionary(char **pArray2, int size); /*prints the words in the dictionary*/
int binarySearch(char **pArray3, int low, int high, char *value);   /*recursive binary search on char array*/

void main(int argc, char *argv[]){
    int i;  /*index*/
    FILE *pFile;    /*pointer to dictionary file*/
    FILE *pInFile;  /*pointer to text input file*/
    FILE *pOutFile; /*pointer to text output file*/
    char **dict;    /*pointer to array of char pointer - dictionary*/
    int count;      /*number of words in dictionary*/
    int dictElement;    /*element the word has been found at returns -1 if word not found*/

    char input[MAX_STRING_SIZE];    /*input to find in dictionary*/
    char temp[MAX_STRING_SIZE];
    char ch;    /*store each char as read - checking for punctuation or space*/
    int numChar = 0; /*number of char in input string*/

    /*************************************************************************************************/
    /*open dictionary file*/
    pFile = fopen("dictionary.txt", "r");   /*open file dictionary.txt for reading*/
    if(pFile==NULL){    /*if file can't be opened*/
        printf("ERROR: File could not be opened!/n");
        exit(EXIT_FAILURE);
    }

    count = dictWordCount(pFile);
    printf("Number of words is: %d\n", count);

    /*Load Dictionary into array*/
    loadDictionary(&dict, pFile, count);

    /*print dictionary*/
    //printDictionary(dict, count);
    /*************************************************************************************************/
    /*open input file for reading*/
    pInFile = fopen(argv[1], "r");
    if(pInFile==NULL){  /*if file can't be opened*/
        printf("ERROR: File %s could not be opened!/n", argv[1]);
        exit(EXIT_FAILURE);
    }
    /*open output file for writing*/
    pOutFile = fopen(argv[2], "w");
    if(pOutFile==NULL){ /*if file can't be opened*/
        printf("ERROR: File could not be created!/n");
        exit(EXIT_FAILURE);
    }

    do{
        ch = fgetc(pInFile);                /*read char fom file*/

        if(isalpha((unsigned char)ch)){     /*if char is alphabetical char*/
            //printf("char is: %c\n", ch);
            input[numChar] = ch;            /*put char into input array*/
            temp[numChar] = tolower(ch);    /*put char in temp in lowercase for dictionary check*/
            numChar++;                      /*increment char array element counter*/
        }
        else{
            if(numChar != 0){
                input[numChar] = '\0';  /*add end of string char*/
                temp[numChar] = '\0';

                dictElement = binarySearch(dict,0,count-1,temp);    /*check if word is in dictionary*/

                if(dictElement == -1){  /*word not in dictionary*/
                    fprintf(pOutFile,"**%s**%c", input, ch);
                }
                else{   /*word is in dictionary*/
                    fprintf(pOutFile, "%s%c", input, ch);
                }
                numChar = 0;    /*reset numChar for next word*/
            }
        }
    }while(ch != EOF);

    /*******************************************************************************************/
    /*free allocated memory*/
    for(i=0;i<count;i++){
        free(dict[i]);
    }
    free(dict);

    /*close files*/
    fclose(pInFile);
    fclose(pOutFile);

}

2 个答案:

答案 0 :(得分:1)

我不是100%确定我已正确理解你的问题,但我会试一试。

首先,你的循环

do{
    ch = fgetc(pInFile);
    /* do stuff */
}while(ch != EOF);

也会在到达文件末尾时运行,因此如果文件的最后一个字节是按字母顺序排列的,您将在输出文件中打印一个不需要的EOF字节,或者,因为您投射{{1将ch传递给unsigned char,通常会产生255 [isalpha()和8位EOF = -1],它会出现在某些区域设置中(en_US.iso885915)例如,被认为是字母字符,这会导致压缩输入文件的最后一个字。

为了解决这个问题,首先,在将unsigned char传递给ch时不要强制转换isalpha(),然后在循环中添加一些逻辑以防止无意中处理EOF。如果需要,我选择用换行符替换它,因为这很简单。

然后仍然打印出不会立即跟随字母字符的非字母字符:

do{
    ch = fgetc(pInFile);                /*read char fom file*/

    if(isalpha(ch)){                    /*if char is alphabetical char*/
        //printf("char is: %c\n", ch);
        input[numChar] = ch;            /*put char into input array*/
        temp[numChar] = tolower(ch);    /*put char in temp in lowercase for dictionary check*/
        numChar++;                      /*increment char array element counter*/
    }
    else{
        if(numChar != 0){
            input[numChar] = '\0';  /*add end of string char*/
            temp[numChar] = '\0';

            dictElement = binarySearch(dict,0,count-1,temp);    /*check if word is in dictionary*/

            if(dictElement == -1){  /*word not in dictionary*/
                fprintf(pOutFile,"**%s**%c", input, (ch == EOF) ? '\n' : ch);
            }
            else{   /*word is in dictionary*/
                fprintf(pOutFile, "%s%c", input, (ch == EOF) ? '\n' : ch);
            }
            numChar = 0;    /*reset numChar for next word*/
        }
        else
        {
            if (ch != EOF) {
                fprintf(pOutFile, "%c",ch);
            }
        }
    }
}while(ch != EOF);

答案 1 :(得分:0)

现在看来,如果char不是按字母顺序排列的,它会触发else的{​​{1}}块,并且字符本身会被忽略。

如果你添加一个语句只是打印出所有非字母字符,我认为它可以实现你想要的。这需要进入if(isalpha((unsigned char)ch)){块并在else块之后,并且只是一个简单的fprintf语句。