向Spark数据框中的列添加常量值

时间:2016-12-28 06:45:52

标签: scala apache-spark apache-spark-sql

我有一个Spark数据框,如下所示

# Maven repository for publishing artifacts
nexusRepo=http://privatenexus/content/repositories/releases
nexusUsername=admin_user
nexusPassword=admin_password

# Maven repository for resolving artifacts
defaultRepository=http://privatenexus/content/groups/public

# Maven coordinates
group=demo.group.db
version=SNAPSHOT

我想为每个列值添加一个常量“del”,除了数据框中的最后一列,如下所示,

id person age
1  naveen 24

有人可以帮我解决如何使用Scala在Spark df中实现这个功能

1 个答案:

答案 0 :(得分:5)

您可以使用concatimport org.apache.spark.sql.functions._ // add suffix to all but last column (would work for any number of cols): val colsWithSuffix = df.columns.dropRight(1).map(c => concat(col(c), lit("del")) as c) def result = df.select(colsWithSuffix :+ $"age": _*) result.show() // +----+---------+---+ // |id |person |age| // +----+---------+---+ // |1del|naveendel|24 | // +----+---------+---+ 功能:

coalesce

编辑:要同时容纳空值,您可以在附加后缀之前用colsWithSuffix包裹列 - 将计算val colsWithSuffix = df.columns.dropRight(1) .map(c => concat(coalesce(col(c), lit("")), lit("del")) as c) 替换为:{/ p>

DELETE