Question

我正在编写一个程序来读取.txt文件，并根据文件中单词的概率生成文本。

对于超过10000个不同单词的文本，我只使用10000个单词且出现次数更多。我正在粘贴我认为相关的代码片段：

vector<pair<int,string>> sorted_words; //Vector with all the words from the .txt associated with the number of occurrences in the .txt, sorted by occurrences (value)
map<string,float> used_words; //map with the words associated with their percentage
int total_count=0; //total words in the .txt
int partial_count=0;  //number of occurrences of the words that are going to be taken into account

/* Code where I calculate total_words and fill and sort sorted_words */

//dumping the 10000 pairs with more occurrences in a map, replacing the number of occurrences with the probabilities
if(sorted_words.size()>=10000){
    vector<pair<int,string>>::iterator it = sorted_words.begin();
    for(int used_count=0; used_count<10000; used_count++) {
        partial_count+=it->first;
        it++;
    }
    it = sorted_words.begin();
    for(int used_count=0; used_count<10000; used_count++) {
        used_words.insert(make_pair(it->second,(float)it->first/partial_count));
        it++;
    }
}
else{
    partial_count = total_count;
    for(auto i: sorted_words) {
        used_words.insert(make_pair(i.second,(float)i.first/total_count));
    }
}

当.txt文件少于10000个不同的单词时，我的控制台会正确打印出西班牙语字符，但是当更多时，我的控制台会打印出正方形块。我想这与iterator取string的方式有关。为什么会这样？

编辑：问题的屏幕截图：

.txt with more than 10000 different words

.txt with less than 10000 different words

C ++：Windows Bash控制台有时会将西班牙语字符显示为块

0 个答案: