在本地运行时,我看到记录的正确输出。但是,当我在群集上运行时,输出是不同的,并且看似不一致。甚至某些mappingGroup输出都是正确的。这是火花塞的问题吗?不知道如何最好地描述我所看到的。
可能我可能不了解mapGroups,并且并非每个组的所有值都使它成为了recordList变量。
case class MyCaseClass (keyValue: int,c2: String,c3: String,c4: Double)
case class NewClass (thing1:String,thing2:String,thing3:String,thing4:String)
case class WorkTodo(myClassRecords: Seq[MyCaseClass]){
def toNewRecords: Seq[NewClass] = {
//e.g. work that requires all MyCaseClass.keyValue=1 to be in the list.
//This function would create new Java Objects to perform calculations and eventually output a set of NewClass records
}
val processedRecords = ds.as[MyCaseClass].groupByKey(_.keyValue)
.mapGroups {
case (v, iter) => {
var recordList = new ListBuffer[MyCaseClass]
iter.foreach {x=>
recordList += MyCaseClass(x.keyValue,x.c2,x.c3,x.c4)
}
WorkToDo(recordList).toNewRecords
}
}
P.S。欢迎其他仍使用数据集的解决方案:)