Question

我有一个数据框，如下所述：

|Name|Date|Length|Width|Height|Other_columns...|
|----|----|------|-----|------|----------------|
|foo |bar | 0.5  | 0.6 | 0.7  |................|

我需要按列Length，Width和Height对其进行分解，并创建一个代表爆炸值的列Dimension。

最终数据集应如下所示：

|Name|Date|Value|Dimension|
|----|----|-----|---------|
|foo |bar | 0.5 | Length  |
|foo |bar | 0.6 | Width   |
|foo |bar | 0.7 | Height  |

我已经想出如何让工作成为任务的第一部分，爆炸性的。这些代码行都运行良好：

val res = params
 .select("Name", "Date", "Length", "Width", "Heigth")
 .withColumn("Value", explode(array("Length", "Width", "Heigth")))
 .drop("Length", "Width", "Heigth")

或

val res = params.select(col("Name"), col("Date"), explode(array("Length", "Width", "Heigth")).as("Value"))

但我不知道如何添加Dimension列及其相应的值。

非常感谢任何帮助：）

Answer 1

一种方法是首先使用UDF创建维度值和标签的元组，然后再展开它们：

val df = Seq(
  ("foo", "bar", 0.5, 0.6, 0.7)
).toDF("Name", "Date", "Length", "Width", "Height")

def zipDimension = udf(
  (l: Double, w: Double, h: Double) => Seq( (l, "Length"), (w, "Width"), (h, "Height") )
)

val df2 = df.
  withColumn("Temp", explode( zipDimension($"Length", $"Width", $"Height") )).
  select($"Name", $"Date", $"Temp._1".as("Value"), $"Temp._2".as("Dimension"))

df2.show
+----+----+-----+---------+
|Name|Date|Value|Dimension|
+----+----+-----+---------+
| foo| bar|  0.5|   Length|
| foo| bar|  0.6|    Width|
| foo| bar|  0.7|   Height|
+----+----+-----+---------+

如何将多列拆分为多行，并根据爆炸列添加其他列？

1 个答案: