我需要在url中传递坐标,但我需要将rdd转换为字符串并用分号分隔。
all_coord_iso_rdd.take(4)
[(-73.57534790039062, 45.5311393737793),
(-73.574951171875, 45.529457092285156),
(-73.5749282836914, 45.52922821044922),
(-73.57501220703125, 45.52901077270508)]
type(all_coord_iso_rdd)
pyspark.rdd.PipelinedRDD
结果展望:
"-73.57534790039062,45.5311393737793;-73.574951171875,45.529457092285156,
-73.5749282836914,45.52922821044922;-73.57501220703125,45.52901077270508"
我的网址格式如下:
http://127.0.0.1/match/v1/driving/-73.57534790039062,45.5311393737793; -73.574951171875,45.529457092285156,-73.5749282836914,45.52922821044922;-73.57501220703125,45.52901077270508
答案 0 :(得分:1)
在您发布的代码段all_coord_iso_rdd
中,rdd
为tuple(float, float)
,每行为take(n)
。致电n
会从rdd
返回x = all_coord_iso_rdd.take(4)
print(x)
#[(-73.57534790039062, 45.5311393737793),
# (-73.574951171875, 45.529457092285156),
# (-73.5749282836914, 45.52922821044922),
# (-73.57501220703125, 45.52901077270508)]
条记录。
str.join
返回的值只是浮点数元组的列表。要将其转换为所需的格式,我们可以在列表理解中使用float
。
首先,您需要将str
转换为","
,然后我们可以使用map(str, ...)
加入每个元组中的值。我们使用str
将每个值映射到print([",".join(map(str, item)) for item in x])
#['-73.5753479004,45.5311393738',
# '-73.5749511719,45.5294570923',
# '-73.5749282837,45.5292282104',
# '-73.575012207,45.5290107727']
。
这会产生:
";"
最后使用print(";".join([",".join(map(str, item)) for item in x]))
加入结果列表以获得所需的输出。
{{1}}
答案 1 :(得分:1)
这是一种纯粹的火花方式(可能对更大的有用) rdds /不同的用例):
list=[(-73.57534790039062, 45.5311393737793),(-73.574951171875, 45.529457092285156),\
(-73.5749282836914, 45.52922821044922),(-73.57501220703125, 45.52901077270508)]
rdd=sc.parallelize(list)
rdd.map(lambda row: ",".join([str(elt) for elt in row]))\
.reduce(lambda x,y: ";".join([x,y]))