将几列转换为spark数据帧中的行

时间:2018-03-23 05:57:51

标签: scala apache-spark

我们以下面的数据框为例

+--------+-----------------------------
|id  | fee_amount   | discount_amount | 
|1   | 10.00        | 5.0             |
|2   | 20.0         | 3.0             |

我希望能够将上述数据框转换为以下

+--------+-----------------------------
|id  | amount_type  | discount_amount |
|1   | fee          |   10.0          |
|1   | discount     |   5.0           |
|2   | fee          |   20.0          |
|2   | discount     |   3.0           |

我只是将行数加倍,我就可以了。

我只想要一个存储金额值的列和另一个存储金额类型的列。在上面的示例中,我获得了列的名称,即需要转置的fee_amountdiscount_amount。这是否可以在spark数据帧中完成?

1 个答案:

答案 0 :(得分:2)

其中一个解决方案是使用列arrayfee_amount以及discount_amount(将添加一行)创建explode

import org.apache.spark.sql.functions._
val df = Seq(
  (1, 10.00, 5.0),
  (2, 20.00, 3.0)
).toDF("id", "fee_amount", "discount_amount")

val result = df.select($"id", posexplode(array($"fee_amount", $"discount_amount")))

//Now replace the exploded value 0 with fee and 1 with discount
result.withColumn("amount_type", when($"pos" === 0, "fee").otherwise("discount"))
  .drop("pos")
  .withColumnRenamed("col", "discount_amount")
  .show()

输出:

+---+---------------+-----------+
|id |discount_amount|amount_type|
+---+---------------+-----------+
|1  |10.0           |fee        |
|1  |5.0            |discount   |
|2  |20.0           |fee        |
|2  |3.0            |discount   |
+---+---------------+-----------+

希望这有帮助!