是否可以将Spark数据集旋转超过两列

时间:2019-05-31 06:53:51

标签: java apache-spark pivot apache-spark-dataset

我有下面的数据集,

val data = Seq(("Banana",1000,"USA","a"), ("Carrots",1500,"USA","b"), ("Beans",1600,"USA","c"),("Banana",400,"China","a"),("Carrots",1200,"China","b"),("Beans",1500,"China","c");
+---------+--------+---------+---------+
| Product | Amount | Country | Streams |
+---------+--------+---------+---------+
| Banana  |   1000 | USA     | a       |
| Carrots |   1500 | USA     | b       |
| Beans   |   1600 | USA     | c       |
| Banana  |    400 | China   | a       |
| Carrots |   1200 | China   | b       |
| Beans   |   1500 | China   | c       |
+---------+--------+---------+---------+

我在想是否有可能像下面在Spark数据集JAVA中旋转两列。

    +---------+---------+---------+---------+--------+---------+---------+
    | Product | USA     |   USA   | USA     |  China | China   |  China  |
    +         +---------+---------+---------+--------+---------+---------+
    |         |   a     |    b    |    c    |   a    |      b  |    c    |
    +---------+---------+---------+---------+--------+---------+---------+
    | Banana  |   1000  | null    | null    |    400 |  null   | null    |
    | Carrots |   null  | 1500    | null    |   null | 1200    | null    |
    | Beans   |   null  | null    | 1600    |   null | null    | 1500    |
    +---------+---------+---------+---------+---------+--------+---------+

0 个答案:

没有答案