输入数据框
val ds = Seq((1,"play framework"),
(2,"spark framework"),
(3,"spring framework ")).toDF("id","subject")
我希望在主题列上出现 title case ,如下所示。
val ds = Seq((1,"Play Framework"),
(2,"Spark Framework"),
(3,"Spring Framework ")).toDF("id","subject")
我可以使用 org.apache.spark.sql.functions
中的Uselow function像ds.select($"subject", lower($"subject")).show
转换为小写。但是我怎样才能如我预期的那样取得结果?
答案 0 :(得分:2)
有一个名为initcap
的内置函数,它完全可以满足您的要求
import org.apache.spark.sql.functions._
ds.withColumn("subject", initcap(col("subject"))).show(false)
public static Column initcap(Column e) Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.
答案 1 :(得分:1)
您可以这样做
val captalizeUDF=udf((str:String)=>str.split(" ").map(word=>word.trim.capitalize).mkString(" "))
ds.select($"id",captalizeUDF($"subject").alias("subject")).show
or
ds.select($"id",initcap($"subject").alias("subject")).show
示例输出:
+---+----------------+
| id| subject|
+---+----------------+
| 1| Play Framework|
| 2| Spark Framework|
| 3|Spring Framework|
+---+----------------+