Question

我有一个数据框，其中有许多列，几乎有100多列（如下所示），

+----+----+---+----+----+---+----+---+----+----+---+...
|c1  |c2  |c3 |c4  |c5  |c6  |c7 |c8 |type|clm |val |...
+----+----+---+----+----+---+----+---+----+----+---+...
|  11| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t1 | a  |5   |...
+----+----+---+----+----+---+----+---+----+----+---+...
|  31| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t2 | b  |6   |...
+----+----+---+----+----+---+----+---+----+----+---+...
|  11| 5.0|3.0| 3.0| 3.0|4.0| 3.0|3.0| t1 | a  |9   |...
+----+----+---+----+----+---+----+---+----+----+---+...

我想将一个列值转换为许多列，因此想使用以下代码：

df.groupBy("type").pivot("clm").agg(first("val")).show()

这会将行值转换为列，但其他列（c1至c8）不作为结果数据帧的一部分。

当我对所有100列进行分组时，它可以工作，但是处理时间太多。

df.groupBy("c1","c2","c3","c4","c5","c6","c7","c8","c100","type").pivot("clm").agg(first("val")).show()

当我尝试下面的方法时，速度稍快一些，但创建了很多列。（clm）*所有其他列几乎完全不同，所以几乎有500列。

df.groupBy("type")
  .pivot("clm").first("val"), 
  .agg(
    first("c1"),
    first("c2"),
    first("c3"),
    first("c4"),
    first("c5"),
    first("c6"),
    first("c7"),
    first("c8"),
    first("c100"),
  ).show()

基于分组并保留所有其他列的数据透视

0 个答案: