Question

我是Spark Java API的新手。我想对我的数据集应用两个groupby（Sum et Count）。

我的Ds是这样的。

+---------+------------+
|  account|    amount  |
+---------+------------+
| aaaaaa  |   1000     |
| aaaaaa  |   2000     |
| bbbbbb  |   4000     |
| cccccc  |   5000     |
| cccccc  |   3000     |

我想要这样的数据集。

 +---------+------------+------------+
 | account |    sum     |    count   |
 +---------+------------+------------+
 | aaaaaa  |   3000     |   2        |
 | bbbbbb  |   4000     |   1        |
 | cccccc  |   8000     |   2        |

有人可以请我用Spark Java API中的表达式指导我

Answer 1

#include <iostream>

template <template <typename, typename> class Op>
class Function
{
};

template <typename A, typename B, bool is_f = std::is_floating_point<A>::value || std::is_floating_point<B>::value > struct Operator;

template <typename A, typename B>
struct Operator<A, B, false>
{};


template <typename A, typename B>
struct Operator<A, B, true>
{};

using FunctionOperator = Function<Operator>;


int main(int argc, char * argv[]){
    std::cout << "hi!\n";
    return 0;
}

两个按数据集分组的Spark Java API

1 个答案: