我有一个如下所示的数据框
+----------+----+----+----+
| date|col1|col2|col3|
+----------+----+----+----+
|2021-05-01| 20| 30| 40|
|2021-05-02| 200| 300| 10|
+----------+----+----+----+
我希望将此数据帧转置/转置为
+-----+----------+----------+
|col |2021-05-01|2021-05-02|
+-----+----------+----------+
|Col1 | 20| 200|
|Col1 | 30| 300|
|Col1 | 40| 10|
+-----+----------+----------+
this 和 this 等其他 stackoverflow 文章在某种程度上帮助了我,但我已经找到了解决方案。
我的方法是(所有失败的尝试)
scala> dUnion.groupBy("date").pivot("date").agg(first("col1")).show()
+----------+----------+----------+
| date|2021-05-01|2021-05-02|
+----------+----------+----------+
|2021-05-02| null| 200|
|2021-05-01| 20| null|
+----------+----------+----------+
scala> dUnion.groupBy("date", "col1", "col2", "col3").pivot("date").agg(first("col1")).show()
+----------+----+----+----+----------+----------+
| date|col1|col2|col3|2021-05-01|2021-05-02|
+----------+----+----+----+----------+----------+
|2021-05-02| 200| 300| 10| null| 200|
|2021-05-01| 20| 30| 40| 20| null|
+----------+----+----+----+----------+----------+
但我能想到的壁橱是
scala> dUnion.groupBy().pivot("date").agg(first("col1")).show()
+----------+----------+
|2021-05-01|2021-05-02|
+----------+----------+
| 20| 200|
+----------+----------+
答案 0 :(得分:1)
这是可能的,但我认为这有点慢。
val schema = df.schema
val longForm = df.flatMap(row => {
val col = row.getString(0)
(1 until row.size).map(i => {
(col, schema(i).name, row.getString(i))
})
})
longForm.groupBy('_2).pivot('_1).agg(first('_3))
.withColumnRenamed("_2", "col").show(10, false)
+----+----------+----------+
|col |2021-05-01|2021-05-02|
+----+----------+----------+
|col3|40 |10 |
|col1|20 |200 |
|col2|30 |300 |
+----+----------+----------+