我有一个Spark数据框,如下所示
# Maven repository for publishing artifacts
nexusRepo=http://privatenexus/content/repositories/releases
nexusUsername=admin_user
nexusPassword=admin_password
# Maven repository for resolving artifacts
defaultRepository=http://privatenexus/content/groups/public
# Maven coordinates
group=demo.group.db
version=SNAPSHOT
我想为每个列值添加一个常量“del”,除了数据框中的最后一列,如下所示,
id person age
1 naveen 24
有人可以帮我解决如何使用Scala在Spark df中实现这个功能
答案 0 :(得分:5)
您可以使用concat
和import org.apache.spark.sql.functions._
// add suffix to all but last column (would work for any number of cols):
val colsWithSuffix = df.columns.dropRight(1).map(c => concat(col(c), lit("del")) as c)
def result = df.select(colsWithSuffix :+ $"age": _*)
result.show()
// +----+---------+---+
// |id |person |age|
// +----+---------+---+
// |1del|naveendel|24 |
// +----+---------+---+
功能:
coalesce
编辑:要同时容纳空值,您可以在附加后缀之前用colsWithSuffix
包裹列 - 将计算val colsWithSuffix = df.columns.dropRight(1)
.map(c => concat(coalesce(col(c), lit("")), lit("del")) as c)
替换为:{/ p>
DELETE