Spark pairRDD无效

时间:2015-06-09 07:31:45

标签: scala apache-spark

  

subtractByKey不是其成员    org.apache.spark.rdd.RDD [(String,LabeledPoint)]

     

join不是 org.apache.spark.rdd.RDD的成员[(字符串,   LabeledPoint)]

为什么会这样? org.apache.spark.rdd.RDD [(字符串,LabeledPoint)] 配对值RDD ,我已导入import org.apache.spark.rdd._

1 个答案:

答案 0 :(得分:0)

在spark-shell中,这完全符合预期,无需import任何内容:

scala> case class LabeledPoint(x: Int, y: Int, label: String)
defined class LabeledPoint

scala> val rdd1 = sc.parallelize(List("this","is","a","test")).map(label => (label, LabeledPoint(0,0,label)))
rdd1: org.apache.spark.rdd.RDD[(String, LabeledPoint)] = MapPartitionsRDD[1] at map at <console>:23

scala> val rdd2 = sc.parallelize(List("this","is","a","test")).map(label => (label, 1))
rdd2: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:21

scala> rdd1.join(rdd2)
res0: org.apache.spark.rdd.RDD[(String, (LabeledPoint, Int))] = MapPartitionsRDD[6] at join at <console>:28