从日期获取星期开始日期和星期结束日期

时间:2020-07-15 10:02:42

标签: pyspark apache-spark-sql

我需要从给定日期获取周开始日期和周结束日期,同时要考虑周从星期日开始到星期六结束。

我提到了这个post,但这以星期一为一周的开始日期。 spark中有任何内置函数可以解决此问题吗?

2 个答案:

答案 0 :(得分:3)

找出星期几,并使用 selectExpr 遍历各列,并将星期日作为星期开始日期

from pyspark.sql import functions as F


df_b = spark.createDataFrame([('1','2020-07-13')],[ "ID","date"])
df_b = df_b.withColumn('day_of_week', F.dayofweek(F.col('date')))
df_b = df_b.selectExpr('*', 'date_sub(date, day_of_week-1) as week_start')
df_b = df_b.selectExpr('*', 'date_add(date, 7-day_of_week) as week_end')

df_b.show()

+---+----------+-----------+----------+----------+
| ID|      date|day_of_week|week_start|  week_end|
+---+----------+-----------+----------+----------+
|  1|2020-07-13|          2|2020-07-12|2020-07-18|
+---+----------+-----------+----------+----------+

在Spark SQL中更新

首先从数据框中创建一个临时视图

df_a.createOrReplaceTempView("df_a_sql")

此处代码

%sql
select *, date_sub(date,dayofweek-1) as week_start,
date_sub(date, 7-dayofweek) as week_end
from
(select *, dayofweek(date) as dayofweek
from df_a_sql) T

输出

+---+----------+-----------+----------+----------+
| ID|      date|day_of_week|week_start|  week_end|
+---+----------+-----------+----------+----------+
|  1|2020-07-13|          2|2020-07-12|2020-07-18|
+---+----------+-----------+----------+----------+

答案 1 :(得分:0)

也许这很有帮助-

加载测试数据

   val df = spark.sql("select cast('2020-07-12' as date) as date")
    df.show(false)
    df.printSchema()

    /**
      * +----------+
      * |date      |
      * +----------+
      * |2020-07-15|
      * +----------+
      *
      * root
      * |-- date: date (nullable = true)
      */

从周日开始到周六结束的一周


    // week starting from SUNDAY and ending SATURDAY
    df.withColumn("week_end", next_day($"date", "SAT"))
      .withColumn("week_start", date_sub($"week_end", 6))
      .show(false)

    /**
      * +----------+----------+----------+
      * |date      |week_end  |week_start|
      * +----------+----------+----------+
      * |2020-07-12|2020-07-18|2020-07-12|
      * +----------+----------+----------+
      */

一周从星期一开始,到星期日结束


    // week starting from MONDAY and ending SUNDAY
    df.withColumn("week_end", next_day($"date", "SUN"))
      .withColumn("week_start", date_sub($"week_end", 6))
      .show(false)

    /**
      * +----------+----------+----------+
      * |date      |week_end  |week_start|
      * +----------+----------+----------+
      * |2020-07-12|2020-07-19|2020-07-13|
      * +----------+----------+----------+
      */

从星期二开始到星期一结束的一周

    // week starting from TUESDAY and ending MONDAY
    df.withColumn("week_end", next_day($"date", "MON"))
      .withColumn("week_start", date_sub($"week_end", 6))
      .show(false)

    /**
      * +----------+----------+----------+
      * |date      |week_end  |week_start|
      * +----------+----------+----------+
      * |2020-07-12|2020-07-13|2020-07-07|
      * +----------+----------+----------+
      */