我有一个包含Polyline列的数据框(来自Magellan)。 我想将此列的一些字段提取到新列。 这是我想要做的一个例子:
spark.read
.format("magellan")
.load(My_Path)
.withColumn("xcoordinates",$"polyline"("xcoordinates")) // Do not work
.drop("polyline")
但后来我收到了错误:
Can't extract value from polyline#1190: need struct type but got polyline;
以下是数据样本:
DF:(id,polyline,otherColumns)
ID1, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
ID2, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
ID3, {"xcoordinates":[55.37,55.376],"indices":[0],"empty":false,"ycoordinates":[25.23,25.232],"boundingBox":{"xmin":55.376,"ymin":25.23,"xmax":55.376,"ymax":25.234},"valid":true,"type":3}, ...
预期输出的一个例子:
DF2:(id,xcoordinates,otherColumns)
ID1, [55.37,55.376], ...
ID2, [55.37,55.376], ...
ID3, [55.37,55.376], ...
编辑: 我终于设法做了我想要的事情:
import magellan.PolyLine
val xcoordinates = (data: PolyLine) => data.xcoordinates
val getXcoordinatesUDF = udf(xcoordinates)
spark.read
.format("magellan")
.load(My_Path)
.withColumn("xcoordinates",getXcoordinatesUDF($"polyline"))
.drop("polyline")