我们以下面的数据框为例
+--------+-----------------------------
|id | fee_amount | discount_amount |
|1 | 10.00 | 5.0 |
|2 | 20.0 | 3.0 |
我希望能够将上述数据框转换为以下
+--------+-----------------------------
|id | amount_type | discount_amount |
|1 | fee | 10.0 |
|1 | discount | 5.0 |
|2 | fee | 20.0 |
|2 | discount | 3.0 |
我只是将行数加倍,我就可以了。
我只想要一个存储金额值的列和另一个存储金额类型的列。在上面的示例中,我获得了列的名称,即需要转置的fee_amount
,discount_amount
。这是否可以在spark数据帧中完成?
答案 0 :(得分:2)
其中一个解决方案是使用列array
和fee_amount
以及discount_amount
(将添加一行)创建explode
import org.apache.spark.sql.functions._
val df = Seq(
(1, 10.00, 5.0),
(2, 20.00, 3.0)
).toDF("id", "fee_amount", "discount_amount")
val result = df.select($"id", posexplode(array($"fee_amount", $"discount_amount")))
//Now replace the exploded value 0 with fee and 1 with discount
result.withColumn("amount_type", when($"pos" === 0, "fee").otherwise("discount"))
.drop("pos")
.withColumnRenamed("col", "discount_amount")
.show()
输出:
+---+---------------+-----------+
|id |discount_amount|amount_type|
+---+---------------+-----------+
|1 |10.0 |fee |
|1 |5.0 |discount |
|2 |20.0 |fee |
|2 |3.0 |discount |
+---+---------------+-----------+
希望这有帮助!