假设我有下一个数据框:
val df = spark.sparkContext.parallelize(Seq(
("A", "12", 50),
("A", "13", 100),
("A", "14", 30),
("B", "15", 40),
("C", "16", 60),
("C", "17", 70)
)).toDF("Name", "Time", "Value")
然后按“时间”旋转:
val pivoted = df.groupBy($"Name").
pivot("Time").
agg(coalesce(sum($"Value"),lit(0)))
pivoted.show()
结果为:
+----+----+----+----+----+----+----+
|Name| 12| 13| 14| 15| 16| 17|
+----+----+----+----+----+----+----+
| B|null|null|null| 40|null|null|
| C|null|null|null|null| 60| 70|
| A| 50| 100| 30|null|null|null|
+----+----+----+----+----+----+----+
直到那时,一切都还好。我想要的是在“第17列”旁边添加一列,以计算每一行的总和。因此,预期输出应为:
+----+----+----+----+----+----+----+----+
|Name| 12| 13| 14| 15| 16| 17|sum |
+----+----+----+----+----+----+----+----+
| B|null|null|null| 40|null|null|40 |
| C|null|null|null|null| 60| 70|130 |
| A| 50| 100| 30|null|null|null|180 |
+----+----+----+----+----+----+----+----+
(Noobly,)我尝试添加“ withColumn”,但失败了:
val pivotedWithSummation = df.groupBy($"Name").
pivot("Time").
agg(coalesce(sum($"Value"),lit(0))).
withColumn("summation", sum($"Value"))
我附带了这个answer,但我无法应用它:/
我正在使用Scala v.2.11.8和Spark 2.3.1
谢谢!
答案 0 :(得分:1)
从原始输入数据框中获取值的总和,然后与透视数据框合并
scala> val pivoted = df.groupBy($"Name").pivot("Time").agg(coalesce(sum($"Value"),lit(0)))
pivoted: org.apache.spark.sql.DataFrame = [Name: string, 12: bigint ... 5 more fields]
scala> pivoted.show
+----+----+----+----+----+----+----+
|Name| 12| 13| 14| 15| 16| 17|
+----+----+----+----+----+----+----+
| B|null|null|null| 40|null|null|
| C|null|null|null|null| 60| 70|
| A| 50| 100| 30|null|null|null|
+----+----+----+----+----+----+----+
scala> val sumOfValuesDF = df.groupBy($"Name").sum("value")
sumOfValuesDF: org.apache.spark.sql.DataFrame = [Name: string, sum(value): bigint]
scala> sumOfValuesDF.show
+----+----------+
|Name|sum(value)|
+----+----------+
| B| 40|
| C| 130|
| A| 180|
+----+----------+
scala> val pivotedWithSummation = pivoted.join(sumOfValuesDF, "Name")
pivotedWithSummation: org.apache.spark.sql.DataFrame = [Name: string, 12: bigint ... 6 more fields]
scala> pivotedWithSummation.show
+----+----+----+----+----+----+----+----------+
|Name| 12| 13| 14| 15| 16| 17|sum(value)|
+----+----+----+----+----+----+----+----------+
| B|null|null|null| 40|null|null| 40|
| C|null|null|null|null| 60| 70| 130|
| A| 50| 100| 30|null|null|null| 180|
+----+----+----+----+----+----+----+----------+