添加具有常量值的新列作为列表

时间:2018-03-27 07:39:44

标签: scala apache-spark

我正在创建像

这样的列表
var transactionList = result.select(col("transaction_id")).distinct().collect().map(_(0)).toList

我想将“transactionList”插入到Dataframe中,然后将其展开

我尝试过像

df.withColumn("transactionList" , ArrayType(for (id <- transactionList) lit(id)) 

但它不起作用

1 个答案:

答案 0 :(得分:1)

您还应该将.map(_(0))替换为.map(_.getString(0))

result.select(col("transaction_id")).distinct().collect().map(.getString(0))

您可以使用litliteral值转换为Column

df.withColumn("transactionList", lit(transactionList))

如果您有transactionList = List("a", "b")

这将在所有行中添加一个新的列transactionList作为数组,其值为(a, b)

/**
   * Creates a [[Column]] of literal value.
   *
   * The passed in object is returned directly if it is already a [[Column]].
   * If the object is a Scala Symbol, it is converted into a [[Column]] also.
   * Otherwise, a new [[Column]] is created to represent the literal value.
   *
   * @group normal_funcs
   * @since 1.3.0
   */