我有一个数据帧“ df”,并列出了lt,如下所述。我想将列表添加为dataframe(“ df”)中的新列,从而可以得到以下结果。请为我提供最优化的方法。
输入
df =>
+---+--------
| id| temp|
+---+-----+
| 1|tmp01|
| 2|tmp02|
| 3|tmp03|
| 4|tmp04|
+---+-----+
lt =>
List(1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04)
输出
+---+-------- +---+-----++---+-----++---+-----++---+-----+
| id| temp| new_col|
+---+-----++---+-----++---+-----++---+-----++---+-----+
| 1|tmp01|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
| 2|tmp02|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
| 3|tmp03|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
| 4|tmp04|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
+---+-----++---+-----++---+-----++---+-----++---+-----+
答案 0 :(得分:1)
您可以使用以下方法。我已经将列表转换为String并添加为Data Frame中的新列。请检查以下代码:
**df.withColumn("new_col", lit(lt.mkString)).show(false)**
+---+-------- +---+-----++---+-----++---+-----++---+-----+
| id| temp| new_col|
+---+-----++---+-----++---+-----++---+-----++---+-----+
| 1|tmp01|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
| 2|tmp02|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
| 3|tmp03|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
| 4|tmp04|1#tmp01, 6#tmp06, 9#tmp09, 4#tmp04 |
+---+-----++---+-----++---+-----++---+-----++---+-----+
答案 1 :(得分:-1)
您需要在列表中添加一个元组:
List(("1","tmp01","a"),("2","tmp06","b"),("3","tmp09","c"),(""4","tmp04","d"))
.toDF("id","temp","new_col")
或
yourDf.withColumn("new_col", List(("a"),("b"),("c"),("d"))
.toDF("row1")
.col("row1"))
此解决方案与您的输出一起使用concat(两列均应为字符串)
import org.apache.spark.sql.functions._
yourDf.withColumn("new_col", concat(col("id"),col("temp")))