从文本文件创建字典

时间:2013-04-28 00:05:48

标签: c++ vector struct

此代码应该输出文件中的每个单词及其出现的次数(编辑:忽略大写/小写差异)。目前,它没有正确执行此操作。这是由于某种空格/标点符号?

struct entry
    {
        string word;
        int count;

    };  


      int main()
        {
            ifstream input1;
            input1.open("Base_text.txt");

            if (input1.fail())
            {
                cout<<"Input file 1 opening failed."<<endl;
                exit(1);
            }

            ifstream input2;
            input2.open("Test_file.txt");

            if (input2.fail())
            {
                cout<<"Input file 2 opening failed."<<endl;
                exit(1);
            }

            vector<entry> base;

            make_dictionary(input1, base);

            int i;
            for (i=0; i<base.size(); i++)
            {
                cout<<base[i].word<<": "<<base[i].count<<endl;
            }


        }

        void make_dictionary(istream& file, vector<entry>& dict)
        {


            string word;

            while (file>>word)
            {
                int i;
                bool found = false;

                for (i=0; i<dict.size(); i++)
                {
                   if (dict[i].word==word)
                   {
                       dict[i].count++;
                       found=true;

                   }
                }


                if(!found)
                {
                    entry ent;
                    ent.word = word;
                    ent.count = 1;
                    dict.push_back(ent);
                }
            }


        }

输入

This is some simple base text to use for comparison with other files.
You may use your own if you so choose; your program shouldn't actually care.
For getting interesting results, longer passages of text may be useful.
In theory, a full novel might work, although it will likely be somewhat slow.

当前(不正确)输出:

This: 1
is: 1
some: 1
simple: 1
base: 1
text: 2
to: 1
use: 2
for: 1
comparison: 1
with: 1
other: 1
files.: 1
You: 1
may: 2
your: 2
own: 1
if: 1
you: 1
so: 1
choose;: 1
program: 1
shouldn't: 1
actually: 1
care.: 1
For: 1
getting: 1
interesting: 1
results,: 1
longer: 1
passages: 1
of: 1
be: 2
useful.: 1
In: 1
theory,: 1
a: 1
full: 1
novel: 1
might: 1
work,: 1
although: 1
it: 1
will: 1
likely: 1
somewhat: 1
slow.: 1

我们不允许在此项目中使用地图。关于我哪里出错的任何想法?

1 个答案:

答案 0 :(得分:0)

如果不考虑大小写,请在阅读后将单词转换为小写。 然后去掉尾随标点符号。 E.g。

while (file>>word)
{
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    word.erase(word.find_last_of(','), 1);
    word.erase(word.find_last_of(';'), 1);
    word.erase(word.find_last_of('.'), 1);
    ...