下面是我的代码,当我尝试遍历每行时:
val df: DataFrame = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", true) // Use first line of all files as header
.option("delimiter", TILDE)
.option("inferSchema", "true") // Automatically infer data types
.load(fileName._2)
val accGrpCountsIds: DataFrame = df.groupBy("accgrpid").count()
LOGGER.info(s"DataFrame Count - ${accGrpCountsIds.count()}")
accGrpCountsIds.show(3)
//switch based on file names and update the model.
accGrpCountsIds.foreach(accGrpRow => {
val accGrpId = accGrpRow.getLong(0)
val rowCount = accGrpRow.getInt(1)
}
当我尝试使用foreach
遍历上面的数据框时,出现一个无法序列化的任务错误。我该怎么办?
答案 0 :(得分:0)
foreach中是否还有其他您没有共享的类型?还是仅此而已,而且行不通?
accGrpCountsIds.foreach(accGrpRow => {
val accGrpId = accGrpRow.getLong(0)
val rowCount = accGrpRow.getInt(1)
}
此外,您可能会发现有用吗? Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects