值
subtractByKey
不是其成员 org.apache.spark.rdd.RDD [(String,LabeledPoint)]值
join
不是 org.apache.spark.rdd.RDD的成员[(字符串, LabeledPoint)]
为什么会这样? org.apache.spark.rdd.RDD [(字符串,LabeledPoint)] 是配对值RDD ,我已导入import org.apache.spark.rdd._
答案 0 :(得分:0)
在spark-shell中,这完全符合预期,无需import
任何内容:
scala> case class LabeledPoint(x: Int, y: Int, label: String)
defined class LabeledPoint
scala> val rdd1 = sc.parallelize(List("this","is","a","test")).map(label => (label, LabeledPoint(0,0,label)))
rdd1: org.apache.spark.rdd.RDD[(String, LabeledPoint)] = MapPartitionsRDD[1] at map at <console>:23
scala> val rdd2 = sc.parallelize(List("this","is","a","test")).map(label => (label, 1))
rdd2: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:21
scala> rdd1.join(rdd2)
res0: org.apache.spark.rdd.RDD[(String, (LabeledPoint, Int))] = MapPartitionsRDD[6] at join at <console>:28