我有两个RDD,一个具有一对地点名称和该地点的多多边形坐标,另一个具有该区域中树木的多多边形坐标。我想在pyspark中找到这两个RDD中多边形的相交区域。
RDD a
具有该地区树木的多面坐标
RDD b
具有一对地点名称和该地点的多多边形坐标
然后,我做了a和b的笛卡尔组合。
现在,通过传递函数intersection_area,我试图找到多边形的交点
这里我遇到以下错误,
File "/usr/hdp/current/spark-client/python/pyspark/worker.py", line 98, in main
command = pickleSer._read_with_length(infile)
File "/usr/hdp/current/spark-client/python/pyspark/serializers.py", line 164, in _read_with_length
return self.loads(obj)
File "/usr/hdp/current/spark-client/python/pyspark/serializers.py", line 442, in loads
return pickle.loads(obj)
ImportError: No module named shapely.geometry.geo
我的代码:
from shapely import *
from shapely.geometry import asShape
def to_shape(multi_polygon):
def to_multi_polygon_json(multi_polygon_json):
ps = []
for p in multi_polygon_json:
ls = []
for l in p:
cs = []
for c in l:
ds = []
for d in c:
ds.append(d)
cs.append((ds[0], ds[1]))
ls.append(cs)
ps.append(ls)
return {'type': 'MultiPolygon', 'coordinates': ps}
return asShape(to_multi_polygon_json(multi_polygon))
def intersection_area(multi_polygon_a, multi_polygon_b):
a = to_shape(multi_polygon_a)
b = to_shape(multi_polygon_b)
return a.intersection(b).area
a.first()
#[[[[144.96233513562794, -37.82850434925475], [144.96233130242123, -37.82850454785653], [144.96232748492085, -37.82850421683983], [144.9623237976764, -37.828503363594265], [144.96232160123958, -37.82850250460986], [144.9623222133609, -37.82848025440896], [144.96235226880833, -37.828480774593196], [144.96235264766736, -37.82848145793179], [144.96235356040816, -37.82848441696255], [144.96235381193424, -37.828487457370365], [144.96235339346634, -37.82849048528118], [144.9623523188634, -37.8284934099163], [144.96235062012963, -37.82849614171223], [144.96234834951218, -37.82849859866522], [144.9623455749836, -37.82850070535189], [144.96234238125348, -37.828502397452624], [144.9623388662872, -37.82850362439485], [144.96233513562794, -37.82850434925475]]]]
b.first()
#(u'Brunswick', [[[[144.974079984, -37.75927600899996], [144.9740860060001, -37.75923499499993], [144.9738775180001, -37.759211258999976], [144.9738900010001, -37.75913200499997], [144.97391000300001, -37.759015991999945], [144.97396559300012, -37.75870125099994], [144.97343353200006, -37.75863879499997], [144.97221530900003, -37.75849641899999], [144.9718280070001, -37.75845113099996], [144.97005564000006, -37.75824387599994], [144.9698756580001, -37.75926698099994], [144.97050061100003, -37.75933979699994], [144.97093032600003, -37.75938983999998], [144.9716583180001, -37.75947462499994], [144.9717951570001, -37.759490552999985], [144.97285690600006, -37.75961418899994], [144.9738259070001, -37.75972707599993], [144.9739976400001, -37.75974705599998], [144.97404600200002, -37.75946600399993]]]])
代码:
c = b.cartesian(a)
d = c.map(lambda x: (x[0][0],intersection_area(x[0][1],x[1])))
我的预期结果是将局部性和交叉区域作为键,值对的RDD