Question

我正在开展性能至关重要的项目。该应用程序正在处理大量数据。代码是用C ++编写的，我需要做一些更改。

给出了以下代码（这不是我的代码，我将其简化为最小值）：

void process<int PARAM1, int PARAM2>() {
    // processing the data
}

void processTheData (int param1, int param2) { // wrapper

    if (param1 == 1 && param2 == 1) { // Ugly looking block of if's
        process<1, 1>();
    else if(param1 == 1 && param2 == 2) {
        process<1, 2>();
    else if(param1 == 1 && param2 == 3) {
        process<1, 3>();
    else if(param1 == 1 && param2 == 4) {
        process<1, 4>();
    else if(param1 == 2 && param2 == 1) {
        process<2, 1>();
    else if(param1 == 2 && param2 == 2) {
        process<2, 2>();
    else if(param1 == 2 && param2 == 3) {
        process<2, 3>();
    else if(param1 == 2 && param2 == 4) {
        process<2, 4>();
    }   // and so on....

}

主要功能：

int main(int argc, char *argv[]) {

    factor1 = atoi(argv[1]);
    factor2 = atoi(argv[2]);

    // choose some optimal param1 and param2
    param1 = choseTheOptimal(factor1, factor2);
    param2 = choseTheOptimal(factor1, factor2);

    processTheData(param1, param2); //start processing

    return 0;
}

希望代码看起来很清楚。

功能：

进程是处理数据的核心功能，
processTheData 是流程函数的包装器。

params（ param1 和 param2 ）所采用的值数量有限（假设约为10 x 10）。

param1 和 param2 的值在执行前是未知的。

如果我只是重写 process 函数，那么它使用函数参数而不是模板常量（表示 process（int PARAM1，int PARAM2）），那么处理是关于慢了10倍。

由于上述原因， PARAM1 和 PARAM2 必须是 process 功能的常量。

有没有什么聪明的方法可以摆脱位于 processTheData 函数中的这个丑陋的块？

Answer 1

喜欢这个。

#include <array>
#include <utility>

template<int PARAM1, int PARAM2>
void process() {
    // processing the data
}

// make a jump table to call process<X, Y> where X is known and Y varies    
template<std::size_t P1, std::size_t...P2s>
constexpr auto make_table_over_p2(std::index_sequence<P2s...>)
{
    return std::array<void (*)(), sizeof...(P2s)>
    {
        &process<int(P1), int(P2s)>...
    };
}

// make a table of jump tables to call process<X, Y> where X and Y both vary    
template<std::size_t...P1s, std::size_t...P2s>
constexpr auto make_table_over_p1_p2(std::index_sequence<P1s...>, std::index_sequence<P2s...> p2s)
{
    using element_type = decltype(make_table_over_p2<0>(p2s));
    return std::array<element_type, sizeof...(P1s)>
    {
        make_table_over_p2<P1s>(p2s)...
    };
}


void processTheData (int param1, int param2) { // wrapper

    // make a 10x10 jump table
    static const auto table = make_table_over_p1_p2(
        std::make_index_sequence<10>(), 
        std::make_index_sequence<10>()
    ) ;

    // todo - put some limit checks here

    // dispatch
    table[param1][param2]();
}

Answer 2

这就是我称之为matic的开关。它需要一个运行时值（在指定范围内），并将其转换为编译时值。

namespace details 
{
  template<std::size_t I>
  using index_t = std::integral_constant<std::size_t, I>;

  template<class F>
  using f_result = std::result_of_t< F&&(index_t<0>) >;
  template<class F>
  using f_ptr = f_result<F>(*)(F&& f);
  template<class F, std::size_t I>
  f_ptr<F> get_ptr() {
    return [](F&& f)->f_result<F> {
      return std::forward<F>(f)(index_t<I>{});
    };
  }
  template<class F, std::size_t...Is>
  auto dispatch( F&& f, std::size_t X, std::index_sequence<Is...> ) {
    static const f_ptr<F> table[]={
      get_ptr<F, Is>()...
    };
    return table[X](std::forward<F>(f));
  }
}
template<std::size_t max, class F>
details::f_result<F>
dispatch( F&& f, std::size_t I ) {
  return details::dispatch( std::forward<F>(f), I, std::make_index_sequence<max>{} );
}

这样做是建立一个跳转表来将运行时数据转换为编译时常量。我使用lambda，因为它使它变得漂亮和通用，并将它传递给一个整数常量。一个整型常量是一个运行时无状态对象，其类型带有常量。

使用示例：

template<std::size_t a, std::size_t b>
void process() {
    static_assert( sizeof(int[a+1]) + sizeof(int[b+1]) >= 0 );
}

constexpr int max_factor_1 = 10;
constexpr int max_factor_2 = 10;

int main() {
    int factor1 = 1;
    int factor2 = 5;

    dispatch<max_factor_1>(
      [factor2](auto factor1) {
        dispatch<max_factor_2>(
          [factor1](auto factor2) {
            process< decltype(factor1)::value, decltype(factor2)::value >();
          },
          factor2
        );
      },
      factor1
    );
}

其中max_factor_1和max_factor_2是constexpr值或表达式。

这使用C ++ 14进行自动lambdas和constexpr隐式转换来自积分常量。

Live example

Answer 3

这就是我想出的。它使用较少的花哨功能（仅enable_if，没有可变参数模板或函数指针），但它也不太通用。将代码粘贴到godbolt表示编译器能够完全优化它，以获得可能在实际代码中具有性能优势的示例代码。

#include <type_traits>

template <int param1, int param2>
void process() {
    static_assert(sizeof(int[param1 + 1]) + sizeof(int[param2 + 1]) > 0);
}

template <int limit2, int param1, int param2>
std::enable_if_t<(param2 > limit2)> pick_param2(int) {
    static_assert("Invalid value for parameter 2");
}

template <int limit2, int param1, int param2>
std::enable_if_t<param2 <= limit2> pick_param2(int p) {
    if (p > 0) {
        pick_param2<limit2, param1, param2 + 1>(p - 1);
    } else {
        process<param1, param2>();
    }
}

template <int limit1, int limit2, int param>
std::enable_if_t<(param > limit1)> pick_param1(int, int) {
    static_assert("Invalid value for parameter 1");
}

template <int limit1, int limit2, int param>
std::enable_if_t<param <= limit1> pick_param1(int p1, int p2) {
    if (p1 > 0) {
        pick_param1<limit1, limit2, param + 1>(p1 - 1, p2);
    } else {
        pick_param2<limit2, param, 0>(p2);
    }
}

template <int limit_param1, int limit_param2>
void pick_params(int param1, int param2) {
    pick_param1<limit_param1, limit_param2, 0>(param1, param2);
}

int main() {
    int p1 = 3;
    int p2 = 5;
    pick_params<10, 10>(p1, p2);
}

我对分析结果感兴趣。

优化C ++模板执行

3 个答案: