Question

我有一个条件，我必须在一年中的5个月内添加5列（到现有的DF）。

现有的DF就像：

EId EName Esal
1   abhi  1100
2   raj   300
3   nanu  400
4   ram   500

输出应如下：

EId EName Esal Jan  Feb  March April May  
1   abhi  1100 1100 1100 1100  1100  1100 
2   raj   300  300  300  300   300   300  
3   nanu  400  400  400  400   400   400
4   ram   500  500  500  500   500   500

我可以用withColumn一个接一个地做这个，但这需要花费很多时间。

有没有办法可以运行一些循环并继续添加列，直到我的条件用完为止。

非常感谢提前。

Answer 1

您可以使用foldLeft。您需要创建所需的List列。

df.show
+---+----+----+
| id|name| sal|
+---+----+----+
|  1|   A|1100|
+---+----+----+

val list = List("Jan", "Feb" , "Mar", "Apr") // ... you get the idea

list.foldLeft(df)((df, month) => df.withColumn(month , $"sal" ) ).show
+---+----+----+----+----+----+----+
| id|name| sal| Jan| Feb| Mar| Apr|
+---+----+----+----+----+----+----+
|  1|   A|1100|1100|1100|1100|1100|
+---+----+----+----+----+----+----+

所以，基本上会发生这样的情况：您在开始使用原始数据帧时折叠您创建的序列，并在继续浏览列表时应用转换。

Answer 2

是，您可以使用foldLeft.FoldLeft进行相同的操作，以所需的值从左到右遍历集合中的元素。

因此，您可以将所需的列存储在List（）中。例如：

val BazarDF = Seq(
        ("Veg", "tomato", 1.99),
        ("Veg", "potato", 0.45),
        ("Fruit", "apple", 0.99),
        ("Fruit", "pineapple", 2.59)
         ).toDF("Type", "Item", "Price")

使用列名和值创建一个列表（例如，使用空值的示例）

var ColNameWithDatatype = List(("Jan", lit("null").as("StringType")),
      ("Feb", lit("null").as("StringType")
     ))
var BazarWithColumnDF1 = ColNameWithDatatype.foldLeft(BazarDF) 
  { (tempDF, colName) =>
                     tempDF.withColumn(colName._1, colName._2)
                }

您可以看到示例Here

如何使用SCALA在spark数据框中添加多个列

2 个答案: