Question

我有一个df，我需要搜索关键字列表中是否有任何元素集合。如果是，我需要将所有这些关键字@分隔在一个名为found的新列中。

我的df就像

utid | description
123  | my name is harry and I live in newyork
234  | my neighbour is daniel and he plays hockey

列表很像list = {harry，daniel，hockey，newyork}

输出应该像

utid | description                                | foundornot
123  | my name is harry and I live in newyork     | harry@newyork
234  | my neighbour is daniel and he plays hockey | daniel@hockey

列表非常像大约20k关键字..如果没有找到打印NF

Answer 1

您可以检查list中的description中是否存在udf列val list = List("harry","daniel","hockey","newyork") import org.apache.spark.sql.functions._ def checkUdf = udf((strCol: String) => if (list.exists(strCol.contains)) list.filter(strCol.contains(_)).mkString("@") else "NF") df.withColumn("foundornot", checkUdf(col("description"))).show(false)列中的元素，并将元素列表作为由分隔的字符串@ 将其返回，或者 NF 字符串为

+----+------------------------------------------+-------------+
|utid|description                               |foundornot   |
+----+------------------------------------------+-------------+
|123 |my name is harry and i live in newyork    |harry@newyork|
|234 |my neighbour is daniel and he plays hockey|daniel@hockey|
+----+------------------------------------------+-------------+

应该给你

{{1}}

从列表中搜索数据帧以及在Scala中的新列中找到的所有元素

1 个答案: