Question

我试图在rdd中使用case类作为key，然后使用case类使用reduceByKey。但是当使用元组时，它工作正常。

case class Employee(id: Int, name: String)

val e1 = Employee(1,"chan")
val e2 = Employee(1,"joey")

val salary = Array((e1,100),(e2,1100),(e1,190),(e2,110))

val salaryRDD = sc.parallelize(salary)
salaryRDD.reduceByKey(_+_).collect

输出：

 res1: Array[(Employee, Int)] = Array((Employee(1,chan),100),
  Employee(1,chan),190), (Employee(1,joey),1100), (Employee(1,joey),110))

但是当与元组一起使用时，这很好。

val t1 = (1,"chan")
val t2 = (1,"joey")
val salary2 = Array((t1,100),(t2,1100),(t1,190),(t2,110))
val salaryRDD2 = sc.parallelize(salary2)
salaryRDD2.reduceByKey(_+_).collect

输出：

res2: Array[((Int, String), Int)] = Array(((1,chan),290), ((1,joey),1210))

hashCode和equals在case类中运行良好。

scala> val em1 = Employee(1,"chan")
em1: Employee = Employee(1,chan)

scala> val em2 = Employee(1,"chan")
em2: Employee = Employee(1,chan)

scala> em1 == em2
res5: Boolean = true

scala> em1.hashCode
res6: Int = 545142355

scala> em2.hashCode
res7: Int = 545142355

为什么会出现这种情况？如何让case类与reduceByKey一起使用？

为什么在reduceByKey中用作键的case类不能在spark中工作？

0 个答案: