我有一个包含时间戳记(unix)的日期框架:
df = spark.createDataFrame(
[
(1527853209,),
(1527852466,),
(1527852178,),
(1527851689,),
(1527852214,),
],
["date_time"]
)
,我使用下面的spark配置:
OPT_SPARK = {
'master': 'yarn',
'spark.executor.extraJavaOptions': '-Duser.timezone=GMT',
'spark.driver.extraJavaOptions': '-Duser.timezone=GMT',
}
在不更改启动选项的情况下,我想将时间戳从“ GMT”转换为“ CET-中欧时间”。
我尝试了以下代码:
from pyspark.sql import functions as F, types as T
from datetime import datetime
from dateutil import tz
def conv(in_ts):
from_zone = tz.gettz('GMT')
to_zone = tz.gettz('CET')
utc = datetime.utcfromtimestamp(in_ts)
utc = utc.replace(tzinfo=from_zone)
n_ts = utc.astimezone(to_zone).replace(tzinfo=None)
return n_ts
conv_udf = F.udf(conv, T.TimestampType())
当我测试该功能时,它工作正常,但不是在火花中:
# 1527853209 is GMT: Friday 1 June 2018 11:40:09
conv(1527853209)
datetime.datetime(2018, 6, 1, 13, 40, 9)
df.select(
"date_time",
F.col("date_time").cast("timestamp"),
conv_udf("date_time")
).show()
+----------+-------------------+-------------------+
| date_time| date_time| conv(date_time)|
+----------+-------------------+-------------------+
|1527853209|2018-06-01 11:40:09|2018-06-01 11:40:09|
|1527852466|2018-06-01 11:27:46|2018-06-01 11:27:46|
|1527852178|2018-06-01 11:22:58|2018-06-01 11:22:58|
|1527851689|2018-06-01 11:14:49|2018-06-01 11:14:49|
|1527852214|2018-06-01 11:23:34|2018-06-01 11:23:34|
+----------+-------------------+-------------------+
我找不到任何内置函数来实现此目的,因此使用UDF似乎是最好的解决方案,但显然,它无法按预期工作。
预期结果:
+----------+-------------------+-------------------+
| date_time| date_time| conv(date_time)|
+----------+-------------------+-------------------+
|1527853209|2018-06-01 11:40:09|2018-06-01 13:40:09|
|1527852466|2018-06-01 11:27:46|2018-06-01 13:27:46|
|1527852178|2018-06-01 11:22:58|2018-06-01 13:22:58|
|1527851689|2018-06-01 11:14:49|2018-06-01 13:14:49|
|1527852214|2018-06-01 11:23:34|2018-06-01 13:23:34|
+----------+-------------------+-------------------+
答案 0 :(得分:0)
我认为您的操作是正确的,但是当您使用show()
打印结果时,您的conv(date_time)
就会被转移到您的时区(GMT)。
如果您唯一需要的是在新时区(CET)中显示日期时间,则可以将udf重写为StringType:
def conv(in_ts):
from_zone = tz.gettz('GMT')
to_zone = tz.gettz('CET')
utc = datetime.utcfromtimestamp(in_ts)
utc = utc.replace(tzinfo=from_zone)
n_ts = utc.astimezone(to_zone)
return n_ts.strftime('%x %X')
conv_udf = F.udf(conv, T.StringType())