Question

我有一个Spark数据框，如下所示

# Maven repository for publishing artifacts
nexusRepo=http://privatenexus/content/repositories/releases
nexusUsername=admin_user
nexusPassword=admin_password

# Maven repository for resolving artifacts
defaultRepository=http://privatenexus/content/groups/public

# Maven coordinates
group=demo.group.db
version=SNAPSHOT

我想为每个列值添加一个常量“del”，除了数据框中的最后一列，如下所示，

id person age
1  naveen 24

有人可以帮我解决如何使用Scala在Spark df中实现这个功能

Answer 1

您可以使用concat和import org.apache.spark.sql.functions._ // add suffix to all but last column (would work for any number of cols): val colsWithSuffix = df.columns.dropRight(1).map(c => concat(col(c), lit("del")) as c) def result = df.select(colsWithSuffix :+ $"age": _*) result.show() // +----+---------+---+ // |id |person |age| // +----+---------+---+ // |1del|naveendel|24 | // +----+---------+---+功能：

coalesce

编辑：要同时容纳空值，您可以在附加后缀之前用colsWithSuffix包裹列 - 将计算val colsWithSuffix = df.columns.dropRight(1) .map(c => concat(coalesce(col(c), lit("")), lit("del")) as c)替换为：{/ p>

DELETE

向Spark数据框中的列添加常量值

1 个答案: