我有:
import spark.implicits._
import org.apache.spark.sql.functions._
val someDF = Seq(
(8, "K25", "2019-05-22"),
(64, "K25", "2019-05-26"),
(64, "K25", "2019-03-26"),
(27, "K26", "2019-02-24")
).toDF("Number", "ID", "Date").withColumn("Date", to_date(col("Date")))
我的目标是根据日期范围过滤此数据框,因此假设我要获取日期为2019-05-26减去3个月的数据框行。我该如何应对呢?
答案 0 :(得分:1)
您可以将过滤器用作
val someDF = Seq(
(8, "K25", "2019-05-22"),
(64, "K25", "2019-05-26"),
(64, "K25", "2019-03-26"),
(27, "K26", "2019-02-24")
).toDF("Number", "ID", "Date").withColumn("Date", to_date(col("Date")))
val compareDate = to_date(lit("2019-05-26"), "yyyy-MM-dd")
someDF.filter(
$"Date" < to_date(lit("2019-05-26"), "yyyy-MM-dd") &&
$"Date" > add_months(compareDate, -3)
)
如果您既知道日期又知道日期格式,就可以简单地使用日期字符串。
输出:
+------+---+----------+
|Number|ID |Date |
+------+---+----------+
|8 |K25|2019-05-22|
|64 |K25|2019-03-26|
+------+---+----------+