我试图计算两个向量的欧氏距离。我有以下数据框:
root
|-- h: string (nullable = true)
|-- id: string (nullable = true)
|-- sid: string (nullable = true)
|-- features: vector (nullable = true)
|-- episodeFeatures: vector (nullable = true)
import org.apache.spark.mllib.util.{MLUtils}
val jP2 = jP.withColumn("dist", MLUtils.fastSquaredDistance("features", 5, "episodeFeatures", 5))
我得到一个错误:
error: method fastSquaredDistance in object MLUtils cannot be accessed in object org.apache.spark.mllib.util.MLUtils
有没有办法访问该私有方法?
答案 0 :(得分:4)
MLUtils
是内部包,即使不是这样,它也不能用于Columns
或(从版本中猜测)ml
向量。您必须设计自己的udf
:
import org.apache.spark.sql.functions._
import org.apache.spark.ml.linalg.Vector
val euclidean = udf((v1: Vector, v2: Vector) => ???) // Fill with preferred logic
val jP2 = jP.withColumn("dist", euclidean($"features", $"episodeFeatures"))