Jodatime Scala和序列化日期时间

时间:2014-08-03 17:07:18

标签: scala jodatime apache-spark

我目前正在编写一个简单的程序来从hive数据库中提取数据,我需要使用jodatime中的日期格式......我有类似的东西:

 import org.joda.time._
 import org.joda.convert._
 import scala.tools._
 import org.joda.time.format.DateTimeFormat._
 object DateExtract {
  // change depending on which segment you with wish to catupure, ie weekly, monthly etc.
  def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
    var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
    1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )} 
    }

  }

然而,当我将数据拉入RDD时,我从这个函数调用中得到一个错误,它返回的对象不可序列化。

 Job aborted due to stage failure: Task not serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$DateExtract$

1 个答案:

答案 0 :(得分:2)

所以我想通了......基本上你要像这样扩展Object:

object DateExtract extends java.io.Serializable {
  // change depending on which segment you with wish to catupure, ie weekly, monthly etc.
  def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
    var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
    1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )} 
    }

  }

或者只是简单地定义函数:

def datesBetween(startDate: DateTime, endDate: DateTime): Seq[DateTime] = {
    var daysBetween = Days.daysBetween(startDate.toDateMidnight(), endDate.toDateMidnight()).getDays()
    1 to daysBetween map { startDate.withFieldAdded(DurationFieldType.days(), _ )} 
    }

因为Spark会固有地序列化函数而且显然不是对象......因为Jodatime应该扩展序列化但是在spark scala eco中它必须隐式声明。