我如何检查另一列中是否存在我的一列值

时间:2019-05-17 12:05:56

标签: python pyspark

我有一个SQL数据框,其中有三列

port    test1        test2
123     apple        ramesh eat apple
436     banana       banana is not a friute
467     cat 
78      tiger        cat is pet                     

我想找到test1列值存在于test2列值中,并且我想要这样的输出

port test1  test2                        check
123  apple  ramesh eat apple               1
436  banana banana is not a fruit          1
467  cat                                   0
78   tiger  cat is pet                     0

3 个答案:

答案 0 :(得分:2)

您可以使用contains函数来解决此问题。很简单。

df = df.withColumn('check',when(col('test2').contains(col('test1')),1).otherwise(0))
df.show(truncate=False)

+----+------+---------------------+-----+
|port|test1 |test2                |check|
+----+------+---------------------+-----+
|123 |apple |ramesh eat apple     |1    |
|436 |banana|banana is not a fruit|1    |
|467 |cat   |null                 |0    |
|78  |tiger |cat is pet           |0    |
+----+------+---------------------+-----+

答案 1 :(得分:0)

您可以使用sql语法:

from pyspark.sql import functions as F

df.withColumn(
    "check",
    F.expr("case when test2 like concat('%', test1, '%') then 1 else 0 end")
).show()

答案 2 :(得分:-1)

您可以使用

df['check'] = df.apply(lambda row: int(row.test1 in row.test2), axis = 1)