我目前正在编写一个简单的程序来从hive数据库中提取数据,我需要使用jodatime中的日期格式......我有类似的东西:
import org.joda.time._
import org.joda.convert._
import scala.tools._
import org.joda.time.format.DateTimeFormat._
object DateExtract {
// change depending on which segment you with wish to catupure, ie weekly, monthly etc.
def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )}
}
}
然而,当我将数据拉入RDD时,我从这个函数调用中得到一个错误,它返回的对象不可序列化。
Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$DateExtract$
答案 0 :(得分:2)
所以我想通了......基本上你要像这样扩展Object:
object DateExtract extends java.io.Serializable {
// change depending on which segment you with wish to catupure, ie weekly, monthly etc.
def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )}
}
}
或者只是简单地定义函数:
def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )}
}
因为Spark会固有地序列化函数而且显然不是对象......因为Jodatime应该扩展序列化但是在spark scala eco中它必须隐式声明。