Spark - 避免可变数据帧

时间:2018-05-09 06:35:35

标签: scala apache-spark dataframe functional-programming apache-spark-sql

假设数据框df包含列c0

我需要在n上执行操作来添加c0列(例如,假设我要向literal(2)添加c0我假设是IntegerType

目前我这样做,

var df = /* my data source */
val n = 5 // example
(1 to n) foreach { i => df = df.withColumn($"c_$i", /* performs some computation on column c0*/)}

如何避免使用可变数据框(var})以及如何替换foreach? 感谢

1 个答案:

答案 0 :(得分:3)

您可以将foldLeft与withColumn一起用于创建新的ncolumns作为

//demo data 
val df = Seq(
  (1, "a"),
  (2, "b"),
  (3, "c")
).toDF("id", "name")


val n = 5

//Use fold left to add each new column with literal value as 
val newDF = (1 to n).foldLeft(df){(tempDF, number) => {
  tempDF.withColumn(number.toString, lit(number))
}}

newDF.show(false)

输出:

+---+----+---+---+---+---+---+
|id |name|1  |2  |3  |4  |5  |
+---+----+---+---+---+---+---+
|1  |a   |1  |2  |3  |4  |5  |
|2  |b   |1  |2  |3  |4  |5  |
|3  |c   |1  |2  |3  |4  |5  |
+---+----+---+---+---+---+---+

希望这有帮助!