从列表中搜索数据框并添加coloumn以表示发现与否

时间:2018-05-28 15:18:14

标签: scala dataframe

这是我的df有2个coloumns:

utid|description
12342|my name is 123 amrud and nitesh
2345|my name is anil
2122|my name is 1234 mohan

和列表{"mohan","nitesh"}之类的列表 需要搜索描述中是否存在此列表中的elemnet ..如果是,则打印"找到"否则打印"未找到"在dataframe的不同coloumn中。输出df应该如下所示: 这个名单远远超过了大约20k的元素.. 输出数据框应如下所示

utid|description|foundornot
12342|my name is 123 amrud and nitesh|found
2345|my name is xyz |not found
2122|my name is 1234 mohan|found

欢迎任何帮助

1 个答案:

答案 0 :(得分:1)

您只需定义udf函数检查条件并返回foundnot found字符串

val list = List("mohan","nitesh")

import org.apache.spark.sql.functions._
def checkUdf = udf((strCol: String) => if (list.exists(strCol.contains)) "found" else "not found")

df.withColumn("foundornot", checkUdf(col("description"))).show(false)

多数民众赞成你应该得到

+-----+-------------------------------+----------+
|utid |description                    |foundornot|
+-----+-------------------------------+----------+
|12342|my name is 123 amrud and nitesh|found     |
|2345 |my name is anil                |not found |
|2122 |my name is 1234 mohan          |found     |
+-----+-------------------------------+----------+

我希望答案很有帮助