Question

代码的目的是基本上删除文本文件中存在的无用数组中的单词。我有一个非常奇怪的问题，代码不会删除短语'在架子上等待'中的'the'这个词，但是其他所有测试用例（很多）都通过了。有什么想法吗？

int main(){
    string useless[20] = { "an", "the" , "of", "to", "and", "but", "nor", "or", "some", "any", "very", "in", "on", "at", "before", "after", "into", "over", "through", "along"};

    ifstream fin("input.txt");
    if(fin.fail()){
        cout << "Input failed to open" << endl;
        exit(-1);
    }

    string line;
    getline(fin, line);
    getline(fin, line);
    getline(fin, line);
    getline(fin, line);

    ofstream fout("output.txt");

    while(getline(fin, line)){
        vector<string> vec;
        istringstream iss(line);
        while (iss) {
            string word;
            iss >> word;
            transform(word.begin(), word.end(), word.begin(), ::tolower);
            vec.push_back(word);
        }

        for(int i = 0; i < vec.size(); i++){
            for(int j = 0; j < 20; j++){
                if(vec[i] == useless[j]){
                    vec.erase(remove(vec.begin(), vec.end(), vec[i]), vec.end());
                }
            }
            fout << vec[i] << " ";
        }
        fout << endl;
    }
}

Answer 1

您在这里使用了错误的迭代

 for(int i = 0; i < vec.size(); i++){
        for(int j = 0; j < 20; j++){
            if(vec[i] == useless[j]){
                vec.erase(remove(vec.begin(), vec.end(), vec[i]), vec.end());
            }
        }
        fout << vec[i] << " ";
    }
    fout << endl;
}

在此迭代之前，您拥有带下一个值的向量：[waiting] [on] [the] [shelf]。当i == 1时，你删除＆＃34; on＆＃34;从向量，你有下一个向量[等待] [] [货架]，但我索引仍然等于1，在你跳过的下一次迭代中＃34;所述＆＃34;单词，因为上一次擦除操作重新组织了你的向量并移动了＃34;＆＃34;删除＆＃34; on＆＃34;位置。

您可以使用remove_if。例如：

 vec.erase(remove_if(vec.begin(), vec.end(), [&]( const string& str )
 {
     return std::find(begin(useless), end(useless), str ) != end(useless);
 }), vec.end());

之后，您将获得过滤后的矢量， 无用的 数组中没有单词。

顺便说一句，我们可以优化它。上面的算法具有下一个复杂性：O（vec_size * useless_size）。我们只能将它优化为O（vec_size）。您可以使用散列集合（unordered_set）代替数组。它为您提供了持续的元素访问时间。

 unordered_set<string> useless = { "an", "the" , "of", "to", "and", "but", "nor", "or", "some", "any", "very", "in", "on", "at", "before", "after", "into", "over", "through", "along" };
 ...
 vec.erase(remove_if(vec.begin(), vec.end(), [&](const string& str)
 {
     return  useless.find(str) != useless.end();
 }), vec.end());

具体的测试用例不会以某种方式通过测试

1 个答案: