Question

假设您编写了一个包含一些操作的矩阵类：

class matrix
{
public:
    double operator()(size_t i, size_t j) const;
    ...
};

matrix operator*(const matrix &lhs, const matrix &rhs);
...

推迟评估某些矩阵表达式是有意义的： m0 * m1 * m2 * m3 * m4 （这是一系列的四个operator*调用）可以从使用dynamic-programming matrix chain multiplication algorithm;非常常见的 m0 * m1 ^t 有一个very efficient dgemm implementation，依此类推。

因此，将实际计算推迟到需要时才付出代价。这会将上述内容更改为：

class matrix
{
private:
    /*
    * Pointer to an abstract base class - either an actual matrix, 
    *    or an expression tree. */
    std::shared_ptr<matrix_imp> m_imp;

public:
    // Forces compaction - 
    double operator()(size_t i, size_t j) const;
    ...
};

/* Lazy; creates a matrix with an expression tree using the
*    internals of lhs and rhs. */
matrix operator*(const matrix &lhs, const matrix &rhs);
...

每个矩阵都包含一个指向基类对象的指针，该对象的范围可以从实矩阵到复杂的表达式树。每个操作都尝试使用对内部实现的最懒惰的更改来形成矩阵。有些操作别无选择，只能实际评估事物，压缩表达式树，并将内部实现设置为实际矩阵。

问题是，在实践中，这在非常常见的情况下造成了巨大的内存开销。假设您从文件中读取长窄矩阵 x = x _{p X q}，p＆gt;＆gt; q ，将 x ^t x 存储在变量中，并丢弃 x 。通过延迟评估，内存为 pq＆gt;＆gt; QQ 。将它们加载到循环中，这是一个严重的问题。（当然，调用operator()的客户端代码在每次加载后都可以强制执行压缩，但是在没有算法验证的情况下要求这样做是丑陋且容易出错的。）

最初，我认为移动ctor是自动压缩的一个好点 - 它恰好是临时成为命名对象的点，而且它的命名对象会导致内存消耗增加，所以

matrix(matrix &&other); // <- Force compaction only here

似乎可以解决所有问题，例如，

auto res = // <- temp becoming named
    a * // temp
    b * // temp
    c + // temp
    2 * // temp
    d;

但可以指望吗？例如，考虑

matrix load_xtx(const string &f_name)
{
    matrix x = ...
    return x.t() * x; 
}

auto xtx = load_xtx("foo.hdf5"); // (*)

是禁止在(*)中使用与NRVO类似的编译器，只是为了构建它？即使不是，编译器可能会在其他情况下优化掉事物吗？

Answer 1

由于＆＃34;内部指针＆＃34;方法不能给出延迟评估所需的所有灵活性，C ++数值库使用的典型解决方案是定义实现惰性评估机制的专用类。旧的SO问题Lazy evaluation in C++及其最佳答案显示了此类设计的基础知识和一些示例代码。

虽然我不是专家，但我认为这个架构的好例子是数字库Eigen（here some details about its implementation）和Blitz ++，它们在很大程度上依赖于模板（我没有在网上找到更新的文档来说明其内部结构，但是this article描述了其引擎的某些部分，并且还提供了对表达模板＆＃34;技术的更广泛的概述。

保证检测临时＆gt;命名点

1 个答案: