如何在pyspark中编写以下sql
select FORMAT(DATEADD(HOUR, addHours, Coltimestamp), 'yyyy-MM-dd HH:mm') AS TransDate;
我有这两列
from pyspark.sql.types import StringType
df= spark.createDataFrame(
[
("2020-06-17T00:44:30", 2),
("2020-06-17T06:06:56",3),
("2020-06-17T15:04:34",5),
],
['Coltimestamp', 'addHours'])
我希望结果将addHours列中的小时数添加到同一数据框中新列中的Coltimestamp中。
答案 0 :(得分:0)
这可能会回答您的问题。
>>> df.show()
+-------------------+--------+
| Coltimestamp|addHours|
+-------------------+--------+
|2017-03-12 03:19:58| 2|
|2017-03-12 03:21:30| 1|
|2017-03-12 03:29:40| 3|
|2017-03-12 03:31:23| 5|
|2017-03-12 04:19:47| 6|
|2017-03-12 04:33:51| 7|
+-------------------+--------+
>>> df.withColumn("new_time", (F.unix_timestamp(df.Coltimestamp) + df.addHours*60*60).cast('timestamp')).show()
+-------------------+--------+-------------------+
| Coltimestamp|addHours| new_time|
+-------------------+--------+-------------------+
|2017-03-12 03:19:58| 2|2017-03-12 05:19:58|
|2017-03-12 03:21:30| 1|2017-03-12 04:21:30|
|2017-03-12 03:29:40| 3|2017-03-12 06:29:40|
|2017-03-12 03:31:23| 5|2017-03-12 08:31:23|
|2017-03-12 04:19:47| 6|2017-03-12 10:19:47|
|2017-03-12 04:33:51| 7|2017-03-12 11:33:51|
+-------------------+--------+-------------------+