我是Scala的新手。我正在尝试使用字典替换部分字符串。
我的字典是:
val dict = Seq(("fruits", "apples"),("color", "red"), ("city", "paris")).
toDF(List("old", "new").toSeq:_*)
+------+------+
| old| new|
+------+------+
|fruits|apples|
| color| red|
| city| paris|
+------+------+
然后我将翻译另一个df中的列中的字段:
+--------------------------+
|oldCol |
+--------------------------+
|I really like fruits |
|they are colored brightly |
|i live in city!! |
+--------------------------+
所需的输出:
+------------------------+
|newCol |
+------------------------+
|I really like apples |
|they are reded brightly |
|i live in paris!! |
+------------------------+
请帮助!我试图隐瞒字典到地图,然后使用replaceAllIn()函数,但确实无法解决这一问题。
我也按照以下答案尝试了foldleft:Scala replace an String with a List of Key/Values。 谢谢
答案 0 :(得分:0)
从Map
dict
创建一个dataframe
,然后您可以像下面这样使用udf
轻松地做到这一点
import org.apache.spark.sql.functions._
//Creating Map from dict dataframe
val oldNewMap=dict.map(row=>row.getString(0)->row.getString(1)).collect.toMap
//Creating udf
val replaceUdf=udf((str:String)=>oldNewMap.foldLeft (str) {case (acc,(key,value))=>acc.replaceAll(key+".", value).replaceAll(key, value)})
//Select old column from oldDf and apply udf
oldDf.withColumn("newCol",replaceUdf(oldDf.col("oldCol"))).drop("oldCol").show
//Output:
+--------------------+
| newCol|
+--------------------+
|I really like apples|
|they are reded br...|
| i live in paris!!|
+--------------------+
希望这对您有帮助