Scala Spark - 将RDD [List [scala.Double]]转换为RDD [scala.Double]

时间:2015-09-21 14:19:38

标签: scala apache-spark

我正在调用mllib Statistics.corr()函数并收到以下错误:

  

(x:org.apache.spark.api.java.JavaRDD [java.lang.Double],y:org.apache.spark.api.java.JavaRDD [java.lang.Double],method:String) scala.Double(x:org.apache.spark.rdd.RDD [scala.Double],y:org.apache.spark.rdd.RDD [scala.Double],method:String)scala.Double   无法应用于(org.apache.spark.rdd.RDD [List [scala.Double]],org.apache.spark.rdd.RDD [List [scala.Double]],String)

println(Statistics.corr(a, b, "pearson"))

将数据类型转换为corr()的正确输入类型需要做什么?

2 个答案:

答案 0 :(得分:4)

尝试使用var time = new Date(); var h = time.getHours(); var m = time.getMinutes(); var s = time.getSeconds(); alert( ("0" + h).slice(-2) + ":" + ("0" + m).slice(-2) + ":" + ("0" + s).slice(-2));和身份功能:

flatMap

答案 1 :(得分:0)

根据this answer的建议,您希望flatten RDD。不幸的是,flatten上没有RDD方法,因此您可以使用flatMap(identity)

println(Statistics.corr(a.flatMap(identity), b.flatMap(identity), "pearson"))