我试图在rdd中使用case类作为key,然后使用case类使用reduceByKey。但是当使用元组时,它工作正常。
case class Employee(id: Int, name: String)
val e1 = Employee(1,"chan")
val e2 = Employee(1,"joey")
val salary = Array((e1,100),(e2,1100),(e1,190),(e2,110))
val salaryRDD = sc.parallelize(salary)
salaryRDD.reduceByKey(_+_).collect
输出:
res1: Array[(Employee, Int)] = Array((Employee(1,chan),100),
Employee(1,chan),190), (Employee(1,joey),1100), (Employee(1,joey),110))
但是当与元组一起使用时,这很好。
val t1 = (1,"chan")
val t2 = (1,"joey")
val salary2 = Array((t1,100),(t2,1100),(t1,190),(t2,110))
val salaryRDD2 = sc.parallelize(salary2)
salaryRDD2.reduceByKey(_+_).collect
输出:
res2: Array[((Int, String), Int)] = Array(((1,chan),290), ((1,joey),1210))
hashCode和equals在case类中运行良好。
scala> val em1 = Employee(1,"chan")
em1: Employee = Employee(1,chan)
scala> val em2 = Employee(1,"chan")
em2: Employee = Employee(1,chan)
scala> em1 == em2
res5: Boolean = true
scala> em1.hashCode
res6: Int = 545142355
scala> em2.hashCode
res7: Int = 545142355
为什么会出现这种情况?如何让case类与reduceByKey一起使用?