C ++:读取数据集并检查vector <class>是否为vector <class> </class> </class>的子集

时间:2014-12-10 15:42:59

标签: c++ vector subset

我有以下代码。代码创建一个向量数据集,其中每个元素都是一个向量。它还会创建一个矢量S.

我想检查哪个数据集向量包含S的向量。显然我做错了,因为对于以下示例, 数据集是:
a b c
a d
a b d

和S:
a b

它应该打印:0 2

对我来说它打印:0 1 2

#include <iostream>
#include <fstream>
#include <sstream>
#include <string.h>
#include <string>
#include <time.h>
#include <vector>
#include <algorithm>

using namespace std;


class StringRef
{
private:
    char const*     begin_;
    int             size_;

public:
    int size() const { return size_; }
    char const* begin() const { return begin_; }
    char const* end() const { return begin_ + size_; }

    StringRef( char const* const begin, int const size )
        : begin_( begin )
        , size_( size )
    {}

    bool operator<(const StringRef& obj) const
    {
        return (strcmp(begin(),obj.begin()) > 0 );
    }

};


/************************************************
 * Checks if vector B is subset of vector A     *
 ************************************************/

bool isSubset(std::vector<StringRef> A, std::vector<StringRef> B)
{
    std::sort(A.begin(), A.end());
    std::sort(B.begin(), B.end());
    return std::includes(A.begin(), A.end(), B.begin(), B.end());
}


vector<StringRef> split3( string const& str, char delimiter = ' ' )
{
    vector<StringRef>   result;

    enum State { inSpace, inToken };

    State state = inSpace;
    char const*     pTokenBegin = 0;    // Init to satisfy compiler.
    for(auto it = str.begin(); it != str.end(); ++it )
    {
        State const newState = (*it == delimiter? inSpace : inToken);
        if( newState != state )
        {
            switch( newState )
            {
            case inSpace:
                result.push_back( StringRef( pTokenBegin, &*it - pTokenBegin ) );
                break;
            case inToken:
                pTokenBegin = &*it;
            }
        }
        state = newState;
    }
    if( state == inToken )
    {
        result.push_back( StringRef( pTokenBegin, &str.back() - pTokenBegin ) );
    }
    return result;
}

int main() {

    vector<vector<StringRef> > Dataset;
    vector<vector<StringRef> > S;

    ifstream input("test.dat");
    long count = 0;
    int sec, lps;
    time_t start = time(NULL);

    cin.sync_with_stdio(false); //disable synchronous IO

    for( string line; getline( input, line ); )
    {
        Dataset.push_back(split3( line ));
        count++;
    };
    input.close();
    input.clear();

    input.open("subs.dat");
    for( string line; getline( input, line ); )
    {
        S.push_back(split3( line ));
    };



    for ( std::vector<std::vector<StringRef> >::size_type i = 0; i < S.size(); i++ )
    {
        for(std::vector<std::vector<StringRef> >::size_type j=0; j<Dataset.size();j++)
        {

            if (isSubset(Dataset[j], S[i]))
            {
                cout << j << " ";
            }

        }
    }

    sec = (int) time(NULL) - start;
    cerr << "C++   : Saw " << count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } else
        cerr << endl;

    return 0;
}

1 个答案:

答案 0 :(得分:2)

您的StringRef类型很危险,因为它包含const char *指针,但没有所有权概念。因此,在构造对象之后的某个时刻指针可能会失效。

确实这就是这里发生的事情:你有一个字符串(line)并创建StringRef s,其中包含指向其内部数据的指针。稍后修改字符串时,这些指针将失效。

您应该创建一个vector<std::string>来防止此问题。