如何在Scala中将DataFrame的列值更改为标题大小写。

时间:2018-06-27 18:55:35

标签: scala apache-spark dataframe

输入数据框

val ds = Seq((1,"play framework"),
  (2,"spark framework"),
  (3,"spring framework ")).toDF("id","subject")

我希望在主题列上出现 title case ,如下所示。

 val ds = Seq((1,"Play Framework"),
  (2,"Spark Framework"),
  (3,"Spring Framework ")).toDF("id","subject")

我可以使用 org.apache.spark.sql.functions

中的Uselow function

ds.select($"subject", lower($"subject")).show

转换为小写。但是我怎样才能如我预期的那样取得结果?

2 个答案:

答案 0 :(得分:2)

有一个名为initcap内置函数,它完全可以满足您的要求

import org.apache.spark.sql.functions._
ds.withColumn("subject", initcap(col("subject"))).show(false)

official documentation说起

  
    

public static Column initcap(Column e) Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

  

答案 1 :(得分:1)

您可以这样做

val captalizeUDF=udf((str:String)=>str.split(" ").map(word=>word.trim.capitalize).mkString(" "))

ds.select($"id",captalizeUDF($"subject").alias("subject")).show

                     or

ds.select($"id",initcap($"subject").alias("subject")).show

示例输出:

+---+----------------+
| id|         subject|
+---+----------------+
|  1|  Play Framework|
|  2| Spark Framework|
|  3|Spring Framework|
+---+----------------+