使用字典scala替换df中的字符串

时间:2018-08-03 14:26:56

标签: scala

我是Scala的新手。我正在尝试使用字典替换部分字符串。

我的字典是:

val dict = Seq(("fruits", "apples"),("color", "red"), ("city", "paris")).
 toDF(List("old", "new").toSeq:_*)

+------+------+
|   old|   new|
+------+------+
|fruits|apples|
| color|   red|
|  city| paris|
+------+------+

然后我将翻译另一个df中的列中的字段:

+--------------------------+
|oldCol                    |
+--------------------------+
|I really like fruits      |
|they are colored brightly |
|i live in city!!          |
+--------------------------+

所需的输出:

+------------------------+
|newCol                  |
+------------------------+
|I really like apples    |
|they are reded brightly |
|i live in paris!!       |
+------------------------+

请帮助!我试图隐瞒字典到地图,然后使用replaceAllIn()函数,但确实无法解决这一问题。

我也按照以下答案尝试了foldleft:Scala replace an String with a List of Key/Values。 谢谢

1 个答案:

答案 0 :(得分:0)

Map dict创建一个dataframe,然后您可以像下面这样使用udf轻松地做到这一点

import org.apache.spark.sql.functions._

//Creating Map from dict dataframe

val oldNewMap=dict.map(row=>row.getString(0)->row.getString(1)).collect.toMap

//Creating udf

val replaceUdf=udf((str:String)=>oldNewMap.foldLeft (str) {case (acc,(key,value))=>acc.replaceAll(key+".", value).replaceAll(key, value)})

//Select old column from oldDf and apply udf 

oldDf.withColumn("newCol",replaceUdf(oldDf.col("oldCol"))).drop("oldCol").show

//Output: 
+--------------------+
|              newCol|
+--------------------+
|I really like apples|
|they are reded br...|
|   i live in paris!!|
+--------------------+

希望这对您有帮助