我需要将以下两列数据帧转换为一行(长到宽)。
+--------+-----+
| udate| cc|
+--------+-----+
|20090622| 458|
|20090624|31068|
|20090626| 151|
|20090629| 148|
|20090914| 453|
+--------+-----+
我需要这种格式:
+--------+------------+----------+----------+
| udate| 20090622 | 20090624 | 20090626 |
+--------+------------+----------+----------+
| cc | 458| 31068 | 151 |etc
我跑了这个:
result_df.groupBy($"udate").pivot("udate").agg(max($"cc")).show()
但结果是所有行的矩阵都转换为所有列:
+--------+--------+--------+--------+--------+--------+---
| udate|20090622|20090624|20090626|20090629|20090703|200
+--------+--------+--------+--------+--------+--------+---
|20090622| 458| null| null| null| null|
|20090624| null| 31068| null| null| null|
|20090626| null| null| 151| null| null|
|20090629| null| null| null| 148| null|
|20090703| null| null| null| null| 362|
|20090704| null| null| null| null| null|
|20090715| null| null| null| null| null|
|20090718| null| null| null| null| null|
|20090721| null| null| null| null| null|
|20090722| null| null| null| null| null|
我预计旋转单列数据集应该会产生一行旋转数据集。
如何修改pivot命令以便将结果集旋转到一行?
答案 0 :(得分:2)
tl; dr 在Spark 2.4.0中,可以简单地归结为单独使用groupBy
。
val solution = d.groupBy().pivot("udate").agg(first("cc"))
scala> solution.show
+--------+--------+--------+--------+--------+
|20090622|20090624|20090626|20090629|20090914|
+--------+--------+--------+--------+--------+
| 458| 31068| 151| 148| 453|
+--------+--------+--------+--------+--------+
如果您真的需要第一列的名称,只需使用withColumn
就可以了。
val betterSolution = solution.select(lit("cc") as "udate", $"*")
scala> betterSolution.show
+-----+--------+--------+--------+--------+--------+
|udate|20090622|20090624|20090626|20090629|20090914|
+-----+--------+--------+--------+--------+--------+
| cc| 458| 31068| 151| 148| 453|
+-----+--------+--------+--------+--------+--------+