我正在编写一个程序来读取.txt
文件,并根据文件中单词的概率生成文本。
对于超过10000个不同单词的文本,我只使用10000个单词且出现次数更多。我正在粘贴我认为相关的代码片段:
vector<pair<int,string>> sorted_words; //Vector with all the words from the .txt associated with the number of occurrences in the .txt, sorted by occurrences (value)
map<string,float> used_words; //map with the words associated with their percentage
int total_count=0; //total words in the .txt
int partial_count=0; //number of occurrences of the words that are going to be taken into account
/* Code where I calculate total_words and fill and sort sorted_words */
//dumping the 10000 pairs with more occurrences in a map, replacing the number of occurrences with the probabilities
if(sorted_words.size()>=10000){
vector<pair<int,string>>::iterator it = sorted_words.begin();
for(int used_count=0; used_count<10000; used_count++) {
partial_count+=it->first;
it++;
}
it = sorted_words.begin();
for(int used_count=0; used_count<10000; used_count++) {
used_words.insert(make_pair(it->second,(float)it->first/partial_count));
it++;
}
}
else{
partial_count = total_count;
for(auto i: sorted_words) {
used_words.insert(make_pair(i.second,(float)i.first/total_count));
}
}
当.txt
文件少于10000个不同的单词时,我的控制台会正确打印出西班牙语字符,但是当更多时,我的控制台会打印出正方形块。我想这与iterator
取string
的方式有关。为什么会这样?
编辑: 问题的屏幕截图: