Question

我有一个看起来像这样的数据框：

+-----+------+-------+-------+
| tom | dick | harry | type  |
+-----+------+-------+-------+
| 100 |  200 |   150 | type1 |
| 200 |  200 |   300 | type2 |
+-----+------+-------+-------+

我需要将其转换为此：

+--------+-------+-------+
| person | type1 | type2 |
+--------+-------+-------+
| tom    |   100 |   200 |
| dick   |   200 |   200 |
| harry  |   150 |   300 |
+--------+-------+-------+

我一直在绕枢轴旋转，不枢轴旋转，融化和堆叠，但似乎并不能满足我的要求（尽管我可能会丢失一些东西）。理想情况下，我希望最终得到一些动态的东西，因此我不必用硬编码来命名汤姆，迪克和哈里。

Answer 1

我没有找到官方的unpivot()或melt()函数，但是我能够提出以下建议：

import scala.collection.mutable.ArrayBuffer

val df = Seq(
    (100, 200, 150, "type1"),
    (200, 200, 300, "type2")
    ).toDF("tom", "dick", "harry", "type")

val columns = df.columns

df.flatMap(r => {
    val buf = ArrayBuffer[(String, String, Int)]()
    val t = r.getAs[String]("type")
    columns.foreach(c => {
        c match {
            case "type" =>
            case _ => buf += Tuple3(c, t, r.getAs[Int](c))
        }
    })
    buf.toIterable
}).toDF("person", "type", "value")
  .groupBy("person")
  .pivot("type")
  .agg(first("value"))
  .show()

结果是：

+------+-----+-----+
|person|type1|type2|
+------+-----+-----+
| harry|  150|  300|
|  dick|  200|  200|
|   tom|  100|  200|
+------+-----+-----+

从本质上讲，这需要两个步骤，首先将DataFrame分解为具有“ person”，“ type”和“ value”列，然后仅对person进行旋转，抓住first()记录在组中。

您拥有的列数可以是任意的，但是它确实需要一个“类型”列，并且它要求所有值都属于同一类型。

希望对于您的用例而言，这种概括足够好。

Answer 2

这是一种将要取消透视的列的名称/值与透视列值一起收集到struct中，将其展平，然后进行groupBy/pivot聚合的方法：

val df = Seq(
  (100, 200, 150, "type1"),
  (200, 200, 300, "type2")
).toDF("tom", "dick", "harry", "type")

val colsToUnpivot = Array("tom", "dick", "harry")
val colToPivot = "type"

val structCols = colsToUnpivot.map(cu => struct(
  lit(cu).as("name"), col(cu).as("cu"), col(colToPivot).as("cp")
))

df.
  withColumn("flattened", explode(array(structCols: _*))).
  groupBy($"flattened.name").pivot($"flattened.cp").agg(first($"flattened.cu")).
  show
// +-----+-----+-----+
// | name|type1|type2|
// +-----+-----+-----+
// |harry|  150|  300|
// | dick|  200|  200|
// |  tom|  100|  200|
// +-----+-----+-----+

部分转置/枢轴数据帧

2 个答案: