如何根据多个列取消透视表

时间:2018-07-10 01:04:31

标签: scala apache-spark

我正在使用Scala和Spark来取消透视一个如下所示的表:

+---+----------+--------+-------+------+-----+
| ID|   Date   |  Type1 | Type2 | 0:30 | 1:00|
+---+----------+--------+-------+------+-----+
|  G| 12/3/2018|  Import|Voltage| 3.5  | 6.8 |
|  H| 13/3/2018|  Import|Voltage| 7.5  | 9.8 |
|  H| 13/3/2018|  Export|   Watt| 4.5  | 8.9 |
|  H| 13/3/2018|  Export|Voltage| 5.6  | 9.1 |
+---+----------+--------+-------+------+-----+

我想按如下方式转置它:

| ID|Date     | Time|Import-Voltage |Export-Votage|Import-Watt|Export-Watt|

|  G|12/3/2018|0:30 |3.5            |0            |0          |0          |
|  G|12/3/2018|1:00 |6.8            |0            |0          |0          |
|  H|13/3/2018|0:30 |7.5            |5.6          |0          |4.5        |
|  H|13/3/2018|1:00 |9.8            |9.1          |0          |8.9        |

TimeDate列也应像

那样合并
12/3/2018 0:30

1 个答案:

答案 0 :(得分:2)

这不是一项直截了当的任务,但是一种方法是:

  1. time和相应的value分组为time-value对的“映射”
  2. 将其平整为time-value对的列
  3. 使用groupBy-pivot-agg作为groupBy键的一部分并以time作为枢轴列来执行types转换,以聚合时间对应的value

下面的示例代码:

import org.apache.spark.sql.functions._

val df = Seq(
  ("G", "12/3/2018", "Import", "Voltage", 3.5, 6.8),
  ("H", "13/3/2018", "Import", "Voltage", 7.5, 9.8),
  ("H", "13/3/2018", "Export", "Watt", 4.5, 8.9),
  ("H", "13/3/2018", "Export", "Voltage", 5.6, 9.1)
).toDF("ID", "Date", "Type1", "Type2", "0:30", "1:00")

df.
  withColumn("TimeValMap", array(
    struct(lit("0:30").as("_1"), $"0:30".as("_2")),
    struct(lit("1:00").as("_1"), $"1:00".as("_2"))
  )).
  withColumn("TimeVal", explode($"TimeValMap")).
  withColumn("Time", $"TimeVal._1").
  withColumn("Types", concat_ws("-", array($"Type1", $"Type2"))).
  groupBy("ID", "Date", "Time").pivot("Types").agg(first($"TimeVal._2")).
  orderBy("ID", "Date", "Time").
  na.fill(0.0).
  show
// +---+---------+----+--------------+-----------+--------------+
// | ID|     Date|Time|Export-Voltage|Export-Watt|Import-Voltage|
// +---+---------+----+--------------+-----------+--------------+
// |  G|12/3/2018|0:30|           0.0|        0.0|           3.5|
// |  G|12/3/2018|1:00|           0.0|        0.0|           6.8|
// |  H|13/3/2018|0:30|           5.6|        4.5|           7.5|
// |  H|13/3/2018|1:00|           9.1|        8.9|           9.8|
// +---+---------+----+--------------+-----------+--------------+