Question

我的问题非常简单，我想使用lambda，就像我可以使用仿函数作为比较器一样，让我解释一下。我有两个大的结构，它们都有自己的operator<实现，我还有一个useless类（这只是这个问题上下文中的类的名称），它使用了两个结构，一切看起来像这样：

struct be_less
{
    //A lot of stuff
    int val;
    be_less(int p_v):val(p_v){}
    bool operator<(const be_less& p_other) const
    {
        return val < p_other.val;
    }
};

struct be_more
{
    //A lot of stuff
    int val;
    be_more(int p_v):val(p_v){}
    bool operator<(const be_more& p_other) const
    {
        return val > p_other.val;
    }
};

class useless
{
    priority_queue<be_less> less_q;
    priority_queue<be_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(elem);
            more_q.emplace(elem);
        }
    }
};

我想删除两个结构中的重复，最简单的想法是使结构成为模板并提供两个函数来进行比较工作：

template<typename Comp>
struct be_all
{
    //Lot of stuff, better do not duplicate
    int val;
    be_all(int p_v):val{p_v}{}
    bool operator<(const be_all<Comp>& p_other) const
    {
        return Comp()(val,p_other.val);
    }
};

class comp_less
{
public:
    bool operator()(int p_first,
                    int p_second)
    {
        return p_first < p_second;
    }
};

class comp_more
{
public:
    bool operator()(int p_first,
                    int p_second)
    {
        return p_first > p_second;
    }
};

typedef be_all<comp_less> all_less;
typedef be_all<comp_more> all_more;

class useless
{
    priority_queue<all_less> less_q;
    priority_queue<all_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(elem);
            more_q.emplace(elem);
        }
    }
};

这项工作非常好，现在肯定我没有任何重复的结构代码以两个额外的功能对象的价格。请注意，我非常简化了operator<的实现，而且，仅仅比较了两个整数，这些真正的代码不仅仅是比较两个整数。

然后我在考虑如何使用lambda做同样的事情（就像一个实验）。我能够实现的唯一有用的解决方案是：

template<typename Comp>
struct be_all
{
    int val;
    function<bool(int,int)> Comparator;
    be_all(Comp p_comp,int p_v):
        Comparator(move(p_comp)),
        val{p_v}
    {}
    bool operator<(const be_all& p_other) const
    {
        return Comparator(val, p_other.val);
    }
};

auto be_less = [](int p_first,
          int p_second)
{
    return p_first < p_second;
};

auto be_more = [](int p_first,
          int p_second)
{
    return p_first > p_second;
};

typedef be_all<decltype(be_less)> all_less;
typedef be_all<decltype(be_more)> all_more;

class useless
{
    priority_queue<all_less> less_q;
    priority_queue<all_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(be_less,elem);
            more_q.emplace(be_more,elem);
        }
    }
};

这个实现不仅为包含struct的数据添加了一个新成员，而且性能也非常差，我准备了一个小测试，我为所有无用的类创建了一个实例，我在这里给你看，每次我给构造函数提供一个满2个百万整数的向量，结果如下：

执行第一个无用类
需要228ms来创建第二个无用的类（仿函数）
需要557ms来创建第三个无用的类（lambdas）

显然，我为删除的重复付出的代价非常高，而在原始代码中，重复仍然存在。请注意第三个实现的性能有多糟糕，比原来的慢十倍，我认为第三个实现比第二个实现慢的原因是因为be_all的构造函数中的附加参数。 ..但是：

实际上还有第四种情况，我仍然使用lambda但我摆脱Comparator成员和be_all中的附加参数，代码如下：

template<typename Comp>
struct be_all
{
    int val;
    be_all(int p_v):val{p_v}
    {}
    bool operator<(const be_all& p_other) const
    {
        return Comp(val, p_other.val);
    }
};

bool be_less = [](int p_first,
          int p_second)
{
    return p_first < p_second;
};

bool be_more = [](int p_first,
          int p_second)
{
    return p_first > p_second;
};

typedef be_all<decltype(be_less)> all_less;
typedef be_all<decltype(be_more)> all_more;

class useless
{
    priority_queue<all_less> less_q;
    priority_queue<all_more> more_q;
public:
    useless(const vector<int>& p_data)
    {
        for(auto elem:p_data)
        {
            less_q.emplace(elem);
            more_q.emplace(elem);
        }
    }
};

如果我从lambda中删除auto并使用bool代替代码构建，即使我在Comp(val, p_other.val)中使用operator<。

对我来说非常奇怪的是，第四个实现（没有Comparator成员的lambda）甚至比另一个慢，最后我能够注册的平均性能如下：

48ms
228ms
557ms
698ms

为什么算子在这种情况下比lambdas快得多？我期待lambda至少表现得像普通的仿函数一样好，你能有人评论吗？有没有技术上的原因，为什么第四个实现比第三个慢？

PS：

我使用的编译器是带有-O3的g ++ 4.8.2。在我的测试中，我为每个useless类创建一个实例并使用chrono我考虑了所需的时间：

namespace benchmark
{
    template<typename T>
    long run()
    {
        auto start=chrono::high_resolution_clock::now();
        T t(data::plenty_of_data);
        auto stop=chrono::high_resolution_clock::now();
        return chrono::duration_cast<chrono::milliseconds>(stop-start).count();
    }
}

和

cout<<"Bad code:  "<<benchmark::run<bad_code::useless>()<<"ms\n";
cout<<"Bad code2: "<<benchmark::run<bad_code2::useless>()<<"ms\n";
cout<<"Bad code3: "<<benchmark::run<bad_code3::useless>()<<"ms\n";
cout<<"Bad code4: "<<benchmark::run<bad_code4::useless>()<<"ms\n";

输入整数集对所有人来说都是一样的，plenty_of_data是一个满载200万个整数的向量。

感谢您的时间

Answer 1

您没有比较lambda和functor的运行时。相反，数字表示使用仿函数和std::function的区别。例如，std::function<R(Args...)>可以存储满足签名Callable的任何R(Args...)。它通过类型擦除来做到这一点。因此，您看到的差异来自std::function::operator()中虚拟呼叫的开销。

例如，libc++实现（3.5）的基类template<class _Fp, class _Alloc, class _Rp, class ..._ArgTypes> __base带有virtual operator()。 std::function存储__base<...>*。每当您使用可调用std::function创建F时，都会创建类型为template<class F, class _Alloc, class R, class ...Args> class __func的对象，该对象继承自__base<...>并覆盖虚拟operator()。

使用lambda而不是函数对象，性能不佳

1 个答案: