我在PySpark中具有以下DataFrame:
id typename lat1 lon1 lat2 lon2 dist radius
1 aaa 41.2 2.1 41.3 2.2 10 20
1 bbb 41.2 2.1 41.3 2.2 10 20
1 ccc 41.2 2.1 41.3 2.2 10 20
2 aaa 41.1 2.2 41.3 2.2 10 20
2 ccc 41.1 2.2 41.3 2.2 10 20
3 aaa 42.1 2.2 41.3 2.2 22 20
我想添加一个新列is_inside_radius
,但是,对于的相同组合,值1应该只出现一次。
import pyspark.sql.functions as func
df \
.withColumn("is_inside",
func.when(
(func.col("dist") <= func.col("radius")), 1
).otherwise(0))
预期结果是:
id typename lat1 lon1 lat2 lon2 dist radius is_inside
1 aaa 41.2 2.1 41.3 2.2 10 20 1
1 bbb 41.2 2.1 41.3 2.2 10 20 0
1 ccc 41.2 2.1 41.3 2.2 10 20 0
2 aaa 41.1 2.2 41.3 2.2 10 20 1
2 ccc 41.1 2.2 41.3 2.2 10 20 0
3 aaa 42.1 2.2 41.3 2.2 22 20 0
我该怎么办?