如何声明一个函数来调用cogroup

时间:2015-11-17 04:05:36

标签: scala apache-spark

我想声明一个函数来获取两个cogroup的{​​{1}}。实际上它是RDD。以下代码无法编译:

interSectionByKey

错误:

def getRetain[K, V](activeUserRdd : RDD[(K, V)], newUserRdd : RDD[(K, V)]): RDD[(K, V)] ={
    activeUserRdd.cogroup(newUserRdd).flatMapValues{
      x => Option((if (!x._1.isEmpty && !x._2.isEmpty) x._2.head else  null).asInstanceOf[V])
    }
  }

我认为value cogroup is not a member of org.apache.spark.rdd.RDD[(K, V)] 未匹配(K, V)中声明的真实[(K, V)],但这是在我的函数中声明的正确方法?

1 个答案:

答案 0 :(得分:0)

ClassTag应用于您的输入类型,以确保在运行时可以访问已删除的类型KV。这是由于type erasure in Scala

scala> import scala.reflect.ClassTag
import scala.reflect.ClassTag

scala> def getRetain[K : ClassTag, V : ClassTag](activeUserRdd : RDD[(K, V)], newUserRdd : RDD[(K, V)]): RDD[(K, V)] ={
 |       activeUserRdd.cogroup(newUserRdd).flatMapValues{
 |         x => Option((if (!x._1.isEmpty && !x._2.isEmpty) x._2.head else  null).asInstanceOf[V])
 |       }
 |     }
 getRetain: [K, V](activeUserRdd: org.apache.spark.rdd.RDD[(K, V)], newUserRdd: org.apache.spark.rdd.RDD[(K, V)])(implicit evidence$1: scala.reflect.ClassTag[K], implicit evidence$2: scala.reflect.ClassTag[V])org.apache.spark.rdd.RDD[(K, V)]