Question

下面是我的代码，当我尝试遍历每行时：

val df: DataFrame = sqlContext.read
  .format("com.databricks.spark.csv")
  .option("header", true) // Use first line of all files as header
  .option("delimiter", TILDE)
  .option("inferSchema", "true") // Automatically infer data types
  .load(fileName._2)

val accGrpCountsIds: DataFrame = df.groupBy("accgrpid").count()
LOGGER.info(s"DataFrame Count - ${accGrpCountsIds.count()}")
accGrpCountsIds.show(3)

//switch based on file names and update the model.
accGrpCountsIds.foreach(accGrpRow => {
  val accGrpId = accGrpRow.getLong(0)
  val rowCount = accGrpRow.getInt(1)
}

当我尝试使用foreach遍历上面的数据框时，出现一个无法序列化的任务错误。我该怎么办？

Answer 1

foreach中是否还有其他您没有共享的类型？还是仅此而已，而且行不通？

accGrpCountsIds.foreach(accGrpRow => {
  val accGrpId = accGrpRow.getLong(0)
  val rowCount = accGrpRow.getInt(1)
}

此外，您可能会发现有用吗？ Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

遍历数据帧，scala时任务无法序列化

1 个答案: