如何在Scala数据框中重命名列标题

时间:2018-11-16 20:19:00

标签: scala apache-spark dataframe

如何在scala数据帧上执行string.replace(“ fromstr”,“ tostr”)。 据我所知,withColumnRenamed在所有列上执行替换,而不仅仅是标题。

1 个答案:

答案 0 :(得分:1)

withColumnRenamed仅重命名列名,数据保持不变。如果需要更改行上下文,可以使用以下之一:

import sparkSession.implicits._
import org.apache.spark.sql.functions._

val inputDf = Seq("to_be", "misc").toDF("c1")
val resultd1Df = inputDf
  .withColumn("c2", regexp_replace($"c1", "^to_be$", "not_to_be"))
  .select($"c2".as("c1"))
resultd1Df.show()

val resultd2Df = inputDf
  .withColumn("c2", when($"c1" === "to_be", "not_to_be").otherwise($"c1"))
  .select($"c2".as("c1"))
resultd2Df.show()

def replace(mapping: Map[String, String]) = udf(
  (from: String) => mapping.get(from).orElse(Some(from))
)

val resultd3Df = inputDf
  .withColumn("c2", replace(Map("to_be" -> "not_to_be"))($"c1"))
  .select($"c2".as("c1"))
resultd3Df.show()

输入数据框:

+-----+
|   c1|
+-----+
|to_be|
| misc|
+-----+

结果数据框:

+---------+
|       c1|
+---------+
|not_to_be|
|     misc|
+---------+

您可以找到可用的Spark功能there

的列表