我的PySpark数据框中有一个普通的时间戳列。我想从给定日期的新列中获取星期几的开始日期。
答案 0 :(得分:1)
对于火花<= 2.2.0
请使用此:
from pyspark.sql.functions import weekofyear, year, to_date, concat, lit, col
from pyspark.sql.session import SparkSession
from pyspark.sql.types import TimestampType
spark = SparkSession.builder.getOrCreate()
spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
.withColumn('timestamp', col('timestamp').astype(TimestampType())) \
.withColumn('week', weekofyear('timestamp')) \
.withColumn('year', year('timestamp')) \
.withColumn('date_of_the_week', to_date(concat('week', lit('/'), 'year'), "w/yyyy")) \
.show(truncate=False)
+-------------------+----+----+----------------+
|timestamp |week|year|date_of_the_week|
+-------------------+----+----+----------------+
|2020-10-03 05:00:00|40 |2020|2020-09-27 |
+-------------------+----+----+----------------+
对于Spark> 2.2.0
from pyspark.sql.functions import date_trunc, col
from pyspark.sql.session import SparkSession
from pyspark.sql.types import TimestampType
spark = SparkSession.builder.getOrCreate()
spark.createDataFrame([['2020-10-03 05:00:00']], schema=['timestamp']) \
.withColumn('timestamp', col('timestamp').astype(TimestampType())) \
.withColumn('date_of_the_week', date_trunc(timestamp='timestamp', format='week')) \
.show(truncate=False)
+-------------------+-------------------+
|timestamp |date_of_the_week |
+-------------------+-------------------+
|2020-10-03 05:00:00|2020-09-28 00:00:00|
+-------------------+-------------------+