我需要从给定日期获取周开始日期和周结束日期,同时要考虑周从星期日开始到星期六结束。
我提到了这个post,但这以星期一为一周的开始日期。 spark中有任何内置函数可以解决此问题吗?
答案 0 :(得分:3)
找出星期几,并使用 selectExpr 遍历各列,并将星期日作为星期开始日期
from pyspark.sql import functions as F
df_b = spark.createDataFrame([('1','2020-07-13')],[ "ID","date"])
df_b = df_b.withColumn('day_of_week', F.dayofweek(F.col('date')))
df_b = df_b.selectExpr('*', 'date_sub(date, day_of_week-1) as week_start')
df_b = df_b.selectExpr('*', 'date_add(date, 7-day_of_week) as week_end')
df_b.show()
+---+----------+-----------+----------+----------+
| ID| date|day_of_week|week_start| week_end|
+---+----------+-----------+----------+----------+
| 1|2020-07-13| 2|2020-07-12|2020-07-18|
+---+----------+-----------+----------+----------+
在Spark SQL中更新
首先从数据框中创建一个临时视图
df_a.createOrReplaceTempView("df_a_sql")
此处代码
%sql
select *, date_sub(date,dayofweek-1) as week_start,
date_sub(date, 7-dayofweek) as week_end
from
(select *, dayofweek(date) as dayofweek
from df_a_sql) T
输出
+---+----------+-----------+----------+----------+
| ID| date|day_of_week|week_start| week_end|
+---+----------+-----------+----------+----------+
| 1|2020-07-13| 2|2020-07-12|2020-07-18|
+---+----------+-----------+----------+----------+
答案 1 :(得分:0)
也许这很有帮助-
val df = spark.sql("select cast('2020-07-12' as date) as date")
df.show(false)
df.printSchema()
/**
* +----------+
* |date |
* +----------+
* |2020-07-15|
* +----------+
*
* root
* |-- date: date (nullable = true)
*/
// week starting from SUNDAY and ending SATURDAY
df.withColumn("week_end", next_day($"date", "SAT"))
.withColumn("week_start", date_sub($"week_end", 6))
.show(false)
/**
* +----------+----------+----------+
* |date |week_end |week_start|
* +----------+----------+----------+
* |2020-07-12|2020-07-18|2020-07-12|
* +----------+----------+----------+
*/
// week starting from MONDAY and ending SUNDAY
df.withColumn("week_end", next_day($"date", "SUN"))
.withColumn("week_start", date_sub($"week_end", 6))
.show(false)
/**
* +----------+----------+----------+
* |date |week_end |week_start|
* +----------+----------+----------+
* |2020-07-12|2020-07-19|2020-07-13|
* +----------+----------+----------+
*/
// week starting from TUESDAY and ending MONDAY
df.withColumn("week_end", next_day($"date", "MON"))
.withColumn("week_start", date_sub($"week_end", 6))
.show(false)
/**
* +----------+----------+----------+
* |date |week_end |week_start|
* +----------+----------+----------+
* |2020-07-12|2020-07-13|2020-07-07|
* +----------+----------+----------+
*/