我想在浏览器中打印访问量最大的网站/网址

时间:2016-04-19 19:21:12

标签: c++ heap priority-queue

下面是我用C ++编写的代码,它为第二和第三输出行打印错误的结果。我无法弄清楚为什么会发生这种情况。

下面是我编写的代码,它是visual studio上功能完备的代码。此代码需要一个名为urlMgr.txt的输入文件,其内容应为URL。以下是我正在使用的示例网址。

  web.whatsapp.com 
  web.whatsapp.com 
  cplusplus.com/reference/algorithm/find_if 
  stackoverflow.com/questions/760221/breaking-in-stdfor-each-loop 
  mail.google.com/mail/u/0/#inbox 
  http://stackoverflow.com/questions/18085331/recursive-lambda-functions-in-c14
  mail.google.com/mail/u/0/#inbox 
  en.cppreference.com/w/cpp/language/lambda 
  https://www.google.co.in/?ion=1&espv=2#q=invariant%20meaning
  mail.google.com/mail/u/0/#inbox
  http://stackoverflow.com/questions/11699083/where-can-i-find-all-the-exception-guarantees-for-the-standard-containers-and-al
  https://www.google.co.in/?ion=1&espv=2#q=array+of+references:quora&start=10
  mail.google.com/mail/u/0/#inbox 
  web.whatsapp.com 
  quora.com/Whats-the-purpose-of-load-factor-in-hash-tables 
  https://www.quora.com/Whats-the-difference-between-the-rehash-and-reserve- methods-of-the-C++-unordered_map      cplusplus.com/reference/unordered_map/unordered_map/load_factor
  cplusplus.com/max_load_factor 
  cplusplus.com/max_load_factor 
  cplusplus.com/max_load_factor 
  cplusplus.com/max_load_factor
  cplusplus.com/max_load_factor 
  cplusplus.com/max_load_factor 
  cplusplus.com/max_load_factor 
  cplusplus.com/max_load_factor 

代码也粘贴在下面。

#include <iostream>
#include <string>
#include <unordered_set>
#include <algorithm>
#include <fstream>
#include <sstream>
#include <functional>
#include <unordered_map>
#include <queue>
using namespace std;

class urlInfo
{
public:
    urlInfo(string &url):urlName(url),hitCount(1)
    {
    }

    int getHitCount() const
    {
        return hitCount;
    }

    string getURL()
    {
        return urlName;
    }

    string getURL() const
    {
        return urlName;
    }

    void updateHitCount()
    {
        hitCount++;
    }

    void setHitCount(int count)
    {
        hitCount = count;
    }

private:
    string urlName;
    int hitCount;
};

class urlInfoMaxHeap
{
public:
    bool operator() (urlInfo *url1, urlInfo *url2) const
    {
        if(url2->getHitCount() > url1->getHitCount())
            return true;
        else
            return false;
    }
};


bool operator==(const urlInfo &ui1,const urlInfo& ui2)
{
    //return (ui1.getURL().compare(ui2.getURL()) == 0) ? 1:0;

    return (ui1.getURL() == ui2.getURL());
}

namespace std
{
    template <> struct hash<urlInfo>
    {
        size_t operator()(urlInfo const & ui)
        {
            return hash<string>()(ui.getURL());
        }
    };
}

class urlMgr
{
public:
    urlMgr(string &fileName)
    {
        ifstream rdStr;
        string str;
        rdStr.open(fileName.c_str(),ios::in);
        if(rdStr.is_open())
        {
            int len;
            rdStr.seekg(0,ios::end);
            len = rdStr.tellg();
            rdStr.seekg(0,ios::beg);
            str.reserve(len+1);
            char *buff = new char[len +1];
            memset(buff,0,len+1);
            rdStr.read(buff,len);
            rdStr.close();
            str.assign(buff);
            delete [] buff;
        }
        stringstream ss(str);
        string token;

        while(getline(ss,token,'\n'))
        {
            //cout<<endl<<token;
            addUrl(token);
        }

    }


    void addUrl(string &url)
    {
        unordered_map<string,urlInfo*>::iterator itr;
        itr = urls.find(url);
        if(itr == urls.end())
        {
            urlInfo *u = new urlInfo(url);
            urls[url] = u;
            maxHeap.push_back(u);
        }
        else
        {
            itr->second->updateHitCount();
            urlInfo* u = itr->second;
            vector<urlInfo*>::iterator vItr;
            vItr = find(maxHeap.begin(),maxHeap.end(),u);
            if(vItr!=maxHeap.end())
            {
                maxHeap.erase(vItr);
                maxHeap.push_back(u);
            }
        }

        make_heap(maxHeap.begin(),maxHeap.end(),urlInfoMaxHeap());
    }

    void releaseResources()
    {
        for_each(urls.begin(),urls.end(),[](pair<string,urlInfo*> p){
            urlInfo* u = p.second;
            delete u;
        });
    }

    void printHeap()
    {
        for_each(maxHeap.begin(),maxHeap.end(),[](urlInfo* u){
            cout<<endl<<u->getHitCount()<<"  "<<u->getURL();
        });
    }
private:
    unordered_map<string,urlInfo*> urls;
    vector<urlInfo*> maxHeap;
};


int main()
{
    string fileName("urlMgr.txt");
    urlMgr um(fileName);
    um.printHeap();
    um.releaseResources();
    cout<<endl<<"Successfully inserted the data"<<endl;
}

我得到的输出是

   8 cplusplus.com/max_load_factor
   3 web.whatsapp.com
   4 mail.google.com/mail/u/0/#inbox
   1 en.cppreference.com/w/cpp/language/lambda
   1 other url's and so on. //all other url's show count as 1.

我期待的是

   8 cplusplus.com/max_load_factor   
   4 mail.google.com/mail/u/0/#inbox
   3 web.whatsapp.com
   1 en.cppreference.com/w/cpp/language/lambda
   1 other url's and so on. //all other url's show count as 1.

1 个答案:

答案 0 :(得分:0)

经过一些调试后我发现了问题。问题在于你解释max_heap()的工作方式。

考虑一下。

url1 occurs 8 times
url2 occurs 4 times
url3 occurs 3 times

致电max_heap()后,您将得到的是

maxHeap[0]=8                     8
maxHeap[1]=4                   4   3
maxHeap[2]=3

或者你也可以

maxHeap[0]=8                     8
maxHeap[1]=3                  3     4
maxHeap[2]=4

以上两个都是maxHeaps,但您考虑的是只有第一个堆可以发生,因此在下面的代码中,您只是打印maxHeap内容,而没有意识到您可能正在打印第二个堆。

  void printHeap()
{
    for_each(maxHeap.begin(),maxHeap.end(),[](urlInfo* u){
        cout<<endl<<u->getHitCount()<<"  "<<u->getURL();
    });
}

要解决此问题。另一种方法是在选择maxHeap[0]之后。删除第一个元素并再次调用max_heap,然后再次选择maxHeap[0]。或者您也可以使用下面的内容。

 while(maxHeap.size()>0){
    cout<<(*maxHeap.begin())->getHitCount()<<" "<<(*maxHeap.begin())->getURL()<<endl;
    std::pop_heap(maxHeap.begin(),maxHeap.end(),urlInfoMaxHeap());maxHeap.pop_back();}

在上面的代码pop_heap()中,将最顶层的元素(根据您传递给make_heap()的比较函数的实现具有最高优先级)移动到最后并再次堆积。然后,您可以删除最后一个元素。

此外,我在您的代码中未找到使用以下内容

 vector<urlInfo*>::iterator vItr;
        vItr = find(maxHeap.begin(),maxHeap.end(),u);
        if(vItr!=maxHeap.end())
        {
            maxHeap.erase(vItr);
            maxHeap.push_back(u);
        }