我必须将包含开始日期和结束日期的日期元组转换为日期系列。
-+-----------------------------------------+
|dateRange |
-+-----------------------------------------+
|[2017-04-06 00:00:00,2017-04-05 00:00:00]|
|[2017-04-05 00:00:00,2017-04-04 00:00:00]|
|[2017-04-04 00:00:00,2017-04-03 00:00:00]|
|[2017-04-03 00:00:00,2017-03-31 00:00:00]|
|[2017-03-31 00:00:00,2017-03-30 00:00:00]|
|[2017-03-30 00:00:00,2017-03-29 00:00:00]|
|[2017-03-29 00:00:00,2017-03-28 00:00:00]|
|[2017-03-28 00:00:00,2017-03-27 00:00:00]|
|[2017-04-06 00:00:00,2017-04-05 00:00:00]|
|[2017-04-05 00:00:00,2017-04-04 00:00:00]|
|[2017-04-04 00:00:00,2017-04-03 00:00:00]|
|[2017-04-03 00:00:00,2017-03-31 00:00:00]|
|[2017-03-31 00:00:00,2017-03-30 00:00:00]|
|[2017-03-30 00:00:00,2017-03-29 00:00:00]|
|[2017-03-29 00:00:00,2017-03-28 00:00:00]|
|[2017-03-28 00:00:00,2017-03-27 00:00:00]|
|[2017-04-06 00:00:00,2017-04-05 00:00:00]|
-+-----------------------------------------+
如何将这些元组转换为'to'到'From'日期的日期系列?
|[2017-04-03 00:00:00,2017-03-31 00:00:00]|
转换后应转换为
|[2017-04-03 00:00:00,2017-04-02 00:00:00,2017-04-01 00:00:00,2017-03-31 00:00:00]|
答案 0 :(得分:2)
我试过下面的代码片段并且它为我工作。
import org.apache.spark.sql.functions._
import org.joda.time.LocalDate
def dayIterator(start: LocalDate, end: LocalDate) = Iterator.iterate(start)(_ plusDays 1) takeWhile (_ isBefore end)
def dateSeries( date1 : String,date2 : String) : Array[String]= {
val fromDate = new LocalDate(date1.split(" ")(0))
val toDate = new LocalDate(date2.split(" ")(0))
val series = dayIterator(fromDate,toDate).toArray
val arr = series.map(a => a.toString() + " 00:00:00.0")
arr
}
val DateSeries = udf(dateSeries(_: String, _ : String))
scala> dateSeries("2017-03-31 00:00:00.0","2017-04-03 00:00:00.0"
res53: Array[String] = Array(2017-03-31, 2017-04-01, 2017-04-02)
在dateSeries方法的地图操作中附加“00:00:00.0”后,我甚至无法弄清楚。它返回的数组没有附加的字符串。
答案 1 :(得分:1)
创建UDF
并计算fromDate
和toDate
之间的日期可以解决问题。为简单起见,我使用了 Joda Time API 。您需要将该依赖项添加为
对于SBT:
libraryDependencies += "joda-time" % "joda-time" % "2.8.1"
以下是您的问题的示例
import spark.implicits._
val data = spark.sparkContext.parallelize(Seq(
("2017-04-03 00:00:00,2017-03-31 00:00:00"),
("2017-03-31 00:00:00,2017-03-30 00:00:00"),
("2017-03-30 00:00:00,2017-03-29 00:00:00"),
("2017-03-29 00:00:00,2017-03-28 00:00:00"),
("2017-03-28 00:00:00,2017-03-27 00:00:00"),
("2017-04-03 00:00:00,2017-03-31 00:00:00"),
("2017-04-06 00:00:00,2017-04-05 00:00:00")
)).toDF("dateRanges")
val calculateDate = udf((date: String) => {
val dtf = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss")
val from = dtf.parseDateTime(date.split(",")(0)).toDateTime()
val to = dtf.parseDateTime(date.split(",")(1)).toDateTime()
val dates = scala.collection.mutable.MutableList[String]()
var toDate = to
while(from.getMillis != toDate.getMillis){
if (from.getMillis > toDate.getMillis){
dates += from.toString(dtf)
toDate = toDate.plusDays(1)
}
else {
dates += from.toString(dtf)
toDate = toDate.minusDays(1)
}
}
dates
})
data.withColumn("newDate", calculateDate(data("dateRanges")))
如果toDate
小于或大于fromDate
,则适用于这两种情况。
希望这有帮助!