我有一个SQL数据框,其中有三列
port test1 test2
123 apple ramesh eat apple
436 banana banana is not a friute
467 cat
78 tiger cat is pet
我想找到test1列值存在于test2列值中,并且我想要这样的输出
port test1 test2 check
123 apple ramesh eat apple 1
436 banana banana is not a fruit 1
467 cat 0
78 tiger cat is pet 0
答案 0 :(得分:2)
您可以使用contains
函数来解决此问题。很简单。
df = df.withColumn('check',when(col('test2').contains(col('test1')),1).otherwise(0))
df.show(truncate=False)
+----+------+---------------------+-----+
|port|test1 |test2 |check|
+----+------+---------------------+-----+
|123 |apple |ramesh eat apple |1 |
|436 |banana|banana is not a fruit|1 |
|467 |cat |null |0 |
|78 |tiger |cat is pet |0 |
+----+------+---------------------+-----+
答案 1 :(得分:0)
您可以使用sql语法:
from pyspark.sql import functions as F
df.withColumn(
"check",
F.expr("case when test2 like concat('%', test1, '%') then 1 else 0 end")
).show()
答案 2 :(得分:-1)
您可以使用
df['check'] = df.apply(lambda row: int(row.test1 in row.test2), axis = 1)