需要帮助修复代码以从文件输出正确的唯一字数的

时间:2019-01-23 23:02:13

标签: c++ arrays string word-count

我正在尝试从文本文件中找到唯一的字数。但是由于某种原因,我的电话总是不通。我的常规字数不错。

我的字符串数组wordArr包含文件中的所有单词。

在发现每个单词不唯一之后,我尝试将每个单词分配给另一个数组,然后遍历所有单词的列表以查看其是否与显示的当前单词匹配。如果单词匹配,则将oldWord设置为true,并且不将该单词计入我的unique计数中。

//New portion
int main(int argc, char *argv[]) {
    //File Paths
    ifstream fp;
    fp.open(argv[1]);
    if (fp.fail()) {
        cout << "Error No file" << endl;
        return 0;
    }
    string wordArr[10000];
    string words;
    string temp;
    int wordCount = 0;


    while (fp >> words) {
        int newWord = 0;
        for (int i; i < words.length(); i++) {
            if (isalpha(words[i])) {

            } else {
                wordArr[wordCount++] = words.substr(0, i);
                //wordCount++;
                newWord = 1;
                if(words[i] + 1 != '\0') {
                    for (int j = i + 1; j <  words.length(); j++) {
                        temp = temp +words[j];
                    }
                    wordArr[wordCount++] = temp;
                    //wordCount++;
                }

            }

        }
        if (newWord == 0) {
            wordArr[wordCount] = words;
            wordCount++;
        }
    }
    cout << "Number of words found was: " << wordCount << endl;
    //New portion


    // makes all lower
    for(int k=0; k<wordCount;k++){ //need to find size of array
        for(int l=0; l<wordArr[k].length(); l++){
            tolower(wordArr[k].at(l));
        }

    }



    //unique count
    string tempArr[10000];
    int unique=0;
    int oldWord=0;
    for(int m=0; m<wordCount;m++ ) {
        for (int n = 0; n < wordCount; n++) {
            if (wordArr[m] == tempArr[n]) {


                oldWord = 1;
            }
        }
        if(oldWord==0){
            wordArr[m] = tempArr[n];
            unique++;
        }
    }
    cout << "Unique word count is: " << unique << endl;
}

我希望从测试用例中获得52个唯一的单词,但最终只能获得37个。

测试用例:

  

密码学既是对用于   私下交流和/或存储信息或数据,以及   安全,不会被第三方拦截。这可能包括   加密,散列和隐写术之类的过程。直到   现代时代,密码学几乎专门指加密,但是   现在,密码学是一个广阔的领域,其应用在许多关键领域   我们生活的各个领域。

2 个答案:

答案 0 :(得分:0)

您需要在每次迭代中重置oldWord:

int (*)(int **pointer)

}

答案 1 :(得分:0)

您的解析代码逻辑错误(实际上,它甚至没有编译)。在如何将非字母字符分解成单词,如何查找和跟踪重复的单词,甚至在降低单词的单词方面,存在逻辑错误。

简而言之,整个代码充满了需要修复的错误,例如:

#include <iostream>
#include <fstream>
#include <string>
#include <ctype.h>
using namespace std;

int main(int argc, char *argv[]) {
    //File Paths
    ifstream fp;
    fp.open(argv[1]);
    if (!fp.is_open()) {
        cout << "Error No file" << endl;
        return 0;
    }

    string wordArr[10000];
    string words;
    int wordCount = 0;
    while ((fp >> words) && (wordCount < 10000)) {
        for (int i = 0; i < words.length(); ++i) {
            if (!isalpha(words[i])) {
                wordArr[wordCount++] = words.substr(0, i);
                if (wordCount == 10000) break;
                ++i;
                while ((i < words.length()) && (!isalpha(words[i]))) {
                    ++i;
                }
                words.erase(0, i);
                i = -1;
            }
        }
        if (words.length() > 0) {
            wordArr[wordCount++] = words;
        }
    }
    cout << "Number of words found was: " << wordCount << endl;

    // makes all lower
    for(int k=0; k<wordCount;k++){ //need to find size of array
        for(int l=0; l<wordArr[k].length(); l++){
            wordArr[k][l] = tolower(wordArr[k][l]);
        }
    }

    //unique count
    string tempArr[10000];
    int unique=0;
    for(int m=0; m<wordCount;m++ ) {
        int oldWord=0;
        for (int n = 0; n < unique; n++) {
            if (wordArr[m] == tempArr[n]) {
                oldWord = 1;
                break;
            }
        }
        if(oldWord==0){
            tempArr[unique++] = wordArr[m];
        }
    }
    cout << "Unique word count is: " << unique << endl;
}

Now the code works as expected

Number of words found was: 64
Unique word count is: 52