数据流中频繁和前k个元素的有效计算

时间:2011-12-19 13:10:09

标签: c++ algorithm

以下是此算法的pseduo代码。

SpaceSaving algorithm

以下是我实现此目的的方法。

#include <iostream>
#include <fstream>
#include <string>
#include <map>

typedef std::map<std::string, int> collection_t;
typedef collection_t::iterator collection_itr_t;

collection_t T;

collection_itr_t get_smallest_key() {
    collection_itr_t min_key = T.begin();
    collection_itr_t key     = ++min_key;
    while ( key != T.end() ) {
        if ( key->second < min_key->second )
            min_key =  key;
        ++key;
    }
    return min_key;
}
void space_saving_frequent( std::string &i, int k ) {
    if ( T.find(i) != T.end())
        T[i]++;
    else if ( T.size() < k ) {
        T.insert(std::make_pair(i, 1 ));
    } else {
        collection_itr_t j = get_smallest_key();
        int cnt = j->second + 1;
        T.erase(j);
        T.insert(std::make_pair(i, cnt));
    }
}
int main ( int argc, char **argv) {
    std::ifstream ifs(argv[1]);
    if ( ifs.peek() == EOF ) 
        return 1;
    std::string line; 
    while( std::getline(ifs,line) ) {
        std::string::size_type left   = line.rfind('=') + 1;
        std::string::size_type length = line.length();
        std::string i     = line.substr(left, length - left - 1);  
        space_saving_frequent(i, 5);
    }
    ifs.close();
    return 0;
}

原始文件链接:http://dimacs.rutgers.edu/~graham/pubs/papers/freqcacm.pdf

但是代码不起作用,我无法弄清楚我错在哪里。

0 个答案:

没有答案