奇怪的添加 - 计算单词出现在文件中的次数

时间:2012-12-15 02:57:43

标签: c++ find counter

好的,所以我的简单项目应该搜索在.txt文件中找到特定字符串的所有情况。案件很重要,如果在另一个词中找到这个词很重要。

(例如:如果单词是“the”:

有效的发现包括:

apple = 1;

thespian = 2;

无效的查找包括:

第四只大象(th和e之间的空间)

苹果(资本化)

如果在文件的一行中找到单词IS,我应该打印掉ONCE行。 如果找不到,我根本不应该打印它。

因此,例如,我的程序的一次运行应该输出:

Searching for 'the' in file 'test.txt'
2 : that they do not use permeates the [C++] language.  Another example
3 : will further illustrate this influence.  Imagine that an integer
5 : What bit value should be moved into the topmost position?  If we
6 : look at the machine level, architectural designers are divided on
8 : the most significant bit position, while on other machines the sign
9 : bit (which, in the case of a negative number, will be 1) is extended.
10 : Either case can be simulated by the other, using software, by means
# occurrences of 'the' = 13

不幸的是,我正在

Searching for 'the' in the file 'test.txt'
2: that they do not use permeates the [C++] language.  Another example
3: will further illustrate this influence.  Imagine that an integer
5: What bit value should be moved into the topmost position?  If we
6: look at the machine level, architectural designers are divided on
8: the most significant bit position, while on other machines the sign
9: bit (which, in the case of a negative number, will be 1) is extended.
10: Either case can be simulated by the other, using software, by means
11: of a combination of tests and masks.
12: 
# occurrences of 'the' = 15

我不明白为什么它认为它在第11和12行找到了“the”。

这是我的代码:

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

int main(int argc, char* argv[]){
//a char pointer is a c-string
//the array is just an array of char pointers
//argv[0] = pointer to the word to search for
//argv[1] = pointer to fileNames

//includes program name @ 0, so three args
if (argc == 3){

    int wordCounter = 0;

    ifstream myFile(argv[2]);

    if (!myFile){
        cout << "File '" << argv[2] << "' could not be opened" << endl;
        return 1;
    }

    else {
        //counts the number of lines in file
        int counter = 0;

        //holds the new line in the file
        char line[100];

        //copies string into buffer that is length of word
        const char * word = argv[1];

        //holds whether found word
        bool found = false;

        cout << "Searching for '" << word << "' in the file '" << argv[2] << "'" << endl;

        //number of chars in a line
        int numChar = 0;

        //saves every line
        while (!(myFile.getline(line, 100)).eof()) {
            //starts every new new at not having found the word
            found = false;
            //read in new line, so increases line counter
            counter ++;
            numChar = 0;

            //find length of line
            for (int i = 0; line[i] != '\n' && i < 101; i++){
                numChar++;
            }

            //finds how many times the key word appears in one line
            //checks up to a few before the end of the line for the word
            if (numChar >= strlen(argv[1])){
                for (int i = 0; i < numChar - strlen(argv[1]); i++){

                    //if the current line letter equals the first letter of the key word
                    if (line[i] == word[0]){

                        //continue looking forward to see if the rest of it match
                        for (int j = 0; j < strlen(argv[1]); j++){

                            //if word doesn't match break
                            if (word[j] != line [i+j]){
                                break;
                            }

                            //if matches all the way to end, add counter
                            if(j == strlen(argv[1]) - 1){
                                wordCounter++;
                                found = true;
                            }
                        }//end 2ndfor
                    }
                }//end 1stfor

                //if the key word has been found, print the line
                if (found){
                    cout << counter << ": " << line << endl;
                }
            }
        }//endwhile

        cout << "# occurrences of '" << word << "' = " << wordCounter << endl;
        myFile.close();
    }//end else
}//end if
return 0;
}//end main

2 个答案:

答案 0 :(得分:0)

  • getline(ifstream&amp;,string)在阅读行时会更有效地使用,在某些情况下,行可能超过100个字符(这些字符包括空格)并使计数混乱。此函数将读入字符串,直到遇到结束行为
  • 您没有正确循环文件,这可能会导致未定义的行为,在这种情况下,添加到您的奇数行,正确的文件循环将是:

//program counts the number of lines in a file
getline(myFile,line) //grab the line
while(myFile) //while the filestream is open and reading
{
    //manipulate line string
    lineCount++;
    getline(myFile,line) //re-read in next line
}

答案 1 :(得分:0)

您的程序认为第11行和第12行中有"the"的原因是

for (int i = 0; line[i] != '\n' && i < 101; i++)

您检查换行符(顺便说一下,它不在缓冲区中),但不是终止0。所以你检查了整个100个字符 - 实际上还有一个,因为你还检查了不存在的line[100],并计算了前一行剩余的"the"个。

for (int i = 0; i < 100 && line[i] != '\0' && line[i] != '\n'; i++)

应该解决这个问题。

检查索引 first 的有效性,以避免因内存访问无效而导致未定义的行为。