火花中的欧氏距离2.1

时间:2017-10-03 16:54:10

标签: scala apache-spark apache-spark-mllib

我试图计算两个向量的欧氏距离。我有以下数据框:

root
 |-- h: string (nullable = true)
 |-- id: string (nullable = true)
 |-- sid: string (nullable = true)
 |-- features: vector (nullable = true)
 |-- episodeFeatures: vector (nullable = true)

import org.apache.spark.mllib.util.{MLUtils}
val jP2 = jP.withColumn("dist", MLUtils.fastSquaredDistance("features", 5, "episodeFeatures", 5)) 

我得到一个错误:

error: method fastSquaredDistance in object MLUtils cannot be accessed in object org.apache.spark.mllib.util.MLUtils

有没有办法访问该私有方法?

1 个答案:

答案 0 :(得分:4)

MLUtils是内部包,即使不是这样,它也不能用于Columns或(从版本中猜测)ml向量。您必须设计自己的udf

import org.apache.spark.sql.functions._
import org.apache.spark.ml.linalg.Vector

val euclidean = udf((v1: Vector, v2: Vector) => ???)  // Fill with preferred logic

val jP2 = jP.withColumn("dist", euclidean($"features", $"episodeFeatures"))