如何在scala数据帧上执行string.replace(“ fromstr”,“ tostr”)。 据我所知,withColumnRenamed在所有列上执行替换,而不仅仅是标题。
答案 0 :(得分:1)
withColumnRenamed
仅重命名列名,数据保持不变。如果需要更改行上下文,可以使用以下之一:
import sparkSession.implicits._
import org.apache.spark.sql.functions._
val inputDf = Seq("to_be", "misc").toDF("c1")
val resultd1Df = inputDf
.withColumn("c2", regexp_replace($"c1", "^to_be$", "not_to_be"))
.select($"c2".as("c1"))
resultd1Df.show()
val resultd2Df = inputDf
.withColumn("c2", when($"c1" === "to_be", "not_to_be").otherwise($"c1"))
.select($"c2".as("c1"))
resultd2Df.show()
def replace(mapping: Map[String, String]) = udf(
(from: String) => mapping.get(from).orElse(Some(from))
)
val resultd3Df = inputDf
.withColumn("c2", replace(Map("to_be" -> "not_to_be"))($"c1"))
.select($"c2".as("c1"))
resultd3Df.show()
输入数据框:
+-----+
| c1|
+-----+
|to_be|
| misc|
+-----+
结果数据框:
+---------+
| c1|
+---------+
|not_to_be|
| misc|
+---------+
您可以找到可用的Spark功能there
的列表