具有纬度和经度列的数据框。使用udf
创建新列,检查相应的邮政编码是否在邮政编码列表内。
根据在search
内部还是外部定义in_borough
,我得到不同的结果。
from uszipcode import SearchEngine
from pyspark.sql.functions import udf
search = SearchEngine(simple_zipcode=True)
lat= 40.77898
long = -73.96925
zipcodes = [10023]
def in_borough(lat, long):
# search = SearchEngine(simple_zipcode=True)
result = search.by_coordinates(lat, long, radius=1,returns=1)
return (int(result[0].zipcode) in zipcodes) if (len(result) > 0) else False
within_borough = udf(in_borough, BooleanType())
df = spark.createDataFrame([{'lat': lat, 'long': long}])
# calling the function
print(in_borough(lat, long))
# calling the udf
df.select('lat', 'long', within_borough('lat', 'long').alias('within_borough')).show()
函数的输出:
True
在{strong>内部函数中定义了search
的udf输出:
+--------+---------+--------------+
| lat| long|within_borough|
+--------+---------+--------------+
|40.77898|-73.96925| True|
+--------+---------+--------------+
在{strong>外部函数中定义了search
的udf输出:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-165-bbbd186ea9d1> in <module>
61
62 # calling the udf
---> 63 df.select('lat', 'long', within_borough('lat', 'long').alias('within_borough')).show()
[...]
330 raise Py4JError(
Py4JJavaError: An error occurred while calling o3794.showString.
[...]
AttributeError: type object 'weakref' has no attribute '__callback__'
[...]