链接列表字频率和排序C ++

时间:2014-03-03 08:31:20

标签: c++ sorting linked-list word-frequency word-cloud

我正在编写一个程序,它从文本文件中读取单词并将所有单词放在链表中。该文件没有标点符号,只有单词。我还想将链表与预先加载的黑名单进行比较,该黑名单也是一个链表。

我所做的是我可以从文件加载链接列表,打印链接列表,检查大小,计算文件中出现的单词的频率,不打印低于指定的单词频率,我也能够将所有单词格式化为小写以便更好地处理。

我遇到的问题是将代码设置为正确,以便只打印出具有多个频率的单词。因此,如果单词“the”出现20次,我不希望它打印“< 1>”那么“< 2>”下次显示时,清除“< 20>”我只想打印一次“< 20>”

我发布了我的加载文件函数,打印函数和插入单词函数,class wordCloud()的所有部分。

以下是代码:

void wordCloud::insertWord(string aWord){
wordNode *newWord = new wordNode(aWord);

//old code
if (head == NULL)
    head = newWord;
else{
    newWord->next = head;
    head = newWord;
}

//revised code
//newWord->next = head;
//head = newWord;
size++;
}

void wordCloud::insertWordDistinct(string word){
for (wordNode *temp = head; temp != NULL; temp = temp->next){
    if (word == temp->myWord){
        temp->freq_count++;
        //cout << temp->freq_count; //for debugging
    }
}
insertWord(word);
}

void wordCloud::printWordCloud(int freq){
wordNode *temp, *previous;
int listSize = 0;

if (head == NULL)                   //determines if there are any words in the list
    cout << "No Word Cloud" << endl;
else{
    temp = head;

    while (temp->next != NULL){         //prints each word until the list is NULL
        if (temp->freq_count >= freq){
            cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
            temp = temp->next;
            listSize++;
        }
        else{
            previous = temp;
            temp = temp->next;
            previous = NULL;
            free(previous);
        }
    }
}
cout << "\nThere are " << size << " words in the file.\n";      //print file size - for debugging - works
cout << "\nThere are " << listSize << " words in the list\n\n";     //print list size - for debugging - works
system("pause");
}

void wordCloud::printBlacklist(){
wordNode *temp;

if (head == NULL)                   //determines if there is a list
    cout << "No Words in the blacklist" << endl;
else{
    temp = head;

    while (temp != NULL){           //prints each word until the list is NULL
        cout << temp->myWord << endl;
        temp = temp->next;
    }
}
cout << "\nThere are " << size << " words in the file.\n\n";        //print size - for debugging - works
system("pause");
}

void wordCloud::loadWordCloud(string fileName){
ifstream file;                      //variable for fileName
string word;                        //string to hold each word

file.open(fileName);                //open file

if (!file) {                        //error handling
    cout << "Error: Can't open the file. File may not exist.\n";
    exit(1);
}

while (!file.eof()){
    file >> word;                   //grab a word from the file one at a time

    insertWordDistinct(changeToLowerCase(word));
    //insertWord(word);             //for debugging
    //cout << word <<'\n';          //print word - for debugging
}

//printWordCloud();                 //print word cloud - for debugging - works
file.close();                       //always make sure to close file after read
}

void wordCloud::loadBlacklist(string fileName){
ifstream file;                      //variable for fileName
string bannedWord;                  //string to hold each word  

file.open(fileName);                //open file

if (!file) {                        //error handling if file does not load
    cout << "Error: Can't open the file. File may not exist.\n";
    exit(1);
}   

while (!file.eof()){
    file >> bannedWord;             //grab a word from the file one at a time

    if (bannedWord.empty()){        //error handling if file is empty
        cout << "File is empty!!\n";
        exit(1);
    }
    insertWord(changeToLowerCase(bannedWord));
    //cout << bannedWord << '\n';   //print blacklist words - for debugging
}

//printBlacklist();                 //print blacklist - for debugging - works
file.close();                       //always make sure to close file after read
}

我注意到如果我在previous = NULL之前放置free(),那我的程序不会崩溃,并且我没有得到任何DLL内存处理错误。事实上,我可以完全取消free(),似乎工作得很好。我只是不知道这是否是正确的方法。在我看来,如果我只是将一个节点指向NULL&lt;它不一定会删除内存中的数据。我不安地不使用free()delete()来终止节点。如果我错了,请纠正我,或者请直接指出我。

差不多,这有什么问题:

wordNode *previous, *temp = head;

while (temp != NULL){
    if (word == temp->myWord){
        temp->freq_count++;
        previous = temp;
        temp = temp->next;
        delete(previous);
    }
}

我可能会发现这个错误,但基本上我只需要找到插入列表中的每个单词的频率,然后删除包含该单词的多个节点,直到只剩下频率计数最高的节点为止打印。我试图在insertWordDistinct(string word)中执行此操作来完成此操作。只是不知道该怎么做。

2 个答案:

答案 0 :(得分:2)

你的打印循环对你毫无帮助。它应该是最小频率上的简单枚举过滤。不应该发生删除,释放或其他内存管理。走一下这个清单:

void wordCloud::printWordCloud(int freq)
{
    int listSize = 0;
    int uniqSize = 0;
    for (wordNode *temp = head; temp; temp = temp->next)
    {
        if (temp->freq_count >= freq)
        {
            cout << temp->myWord << " <" << temp->freq_count << ">" << endl;
            listSize += temp->freq_count;
            ++uniqSize;
        }
    }

    cout << "\nThere are " << size << " words in the file.\n";
    cout << "\nThere are " << listSize << " words in the filtered list\n\n";
    cout << "\nThere are " << uniqSize << " unique words in the filtered list\n\n";
    system("pause");
}

这还应该让您回到正确管理wordCloud::~wordCloud()析构函数中的列表,以再次正确删除节点。还有很多其他的东西我会做不同的,但它是一个学习过程所以我不会破坏你的派对。


<强>更新

根据OP的请求,下面是一个示例链表插入函数,插入在构建列表时进行排序。在适应这一点的同时,他发现了原始实施方式的重大差异和问题。希望它也可以帮助其他人。

void wordCloud::insert(const std::string& aWord, unsigned int freq)
{
    // manufacture lower-case version of word;
    std::string lcaseWord = make_lower(aWord);

    // search for the word by walking a pointer-to-pointer
    //  through the pointers in the linked list.
    wordNode** pp = &head;
    while (*pp && ((*pp)->myWord < lcaseWord)
        pp = &(*pp)->next;

    if (*pp && !(lcaseWord < (*pp)->myWord))
    {
        (*pp)->freq_count++;
    }
    else
    {    // insert the node
        wordNode *node = new wordNode(lcaseWord);
        node->freq_count = freq;
        node->next = *pp;
        *pp = node;
        ++size;
    }
}

答案 1 :(得分:0)

我认为每个单词只打印一次,你必须创建一个唯一的列表,其中包含原始列表中的单词及其出现次数。要做到这一点,你只需要两个循环。一个用于从原始列表中获取每个单词,另一个用于检查单词是否在唯一列表中。为此,你应该制作第二个列表并复制每个单词一次,如果单词出现不止一次你只是增加频率。