查询以获取Hive或Pyspark中每个星期日和星期六的日期

时间:2020-11-03 07:46:54

标签: pyspark hive hiveql

我想从一个给定的日期获取Hive中所有星期日和星期六的日期。 例如,如果给定的日期是2020-10-01,则需要返回两行,其中sunday_dates,saturday_dates的日期为“ 2020-10-01”之后的所有星期日和星期六。

我尝试过类似的操作,但似乎对我不起作用。

spark.sql("select date_sub('2020-10-01', cast(date_format(current_date(),'u')%7 as int)) as sunday_dates").show(10,False)
+------------+
|sunday_dates|
+------------+
|2020-09-29  |
+------------+

在Hive或pyspark中有什么方法可以实现这一目标。

谢谢!

1 个答案:

答案 0 :(得分:1)

您需要使用 date_trunc() 才能到达星期开始日期 date_sub() date_sub() < / strong>以获取周六和周日

在此处创建数据框

    df = spark.createDataFrame([("2020-11-02",1),("2020-11-03",2),("2020-11-04",3)],["event_dt","word"])
    df.show()
    df = df.withColumn("week_start", F.date_trunc('WEEK', F.col("event_dt")))
#`In case you want to get backward weekdays`
    df = df.selectExpr('*', 'date_sub(week_start, 2) as backward_Saturday')
    df = df.selectExpr('*', 'date_sub(week_start, 1) as backward_Sunday')
# In case you want forward weekends
    df = df.selectExpr('*', 'date_add(week_start, 5) as forward_Saturday')
    df = df.selectExpr('*', 'date_add(week_start, 6) as forward_Sunday')
    df.show()

输入

+----------+----+
|  event_dt|word|
+----------+----+
|2020-11-02|   1|
|2020-11-03|   2|
|2020-11-04|   3|
+----------+----+

输出

+----------+----+-------------------+-----------------+---------------+----------------+--------------+
|  event_dt|word|         week_start|backward_Saturday|backward_Sunday|forward_Saturday|forward_Sunday|
+----------+----+-------------------+-----------------+---------------+----------------+--------------+
|2020-11-02|   1|2020-11-02 00:00:00|       2020-10-31|     2020-11-01|      2020-11-07|    2020-11-08|
|2020-11-03|   2|2020-11-02 00:00:00|       2020-10-31|     2020-11-01|      2020-11-07|    2020-11-08|
|2020-11-04|   3|2020-11-02 00:00:00|       2020-10-31|     2020-11-01|      2020-11-07|    2020-11-08|
+----------+----+-------------------+-----------------+---------------+----------------+--------------+