将日灯节省时间字符串转换为时间戳会产生错误的结果

时间:2018-01-10 20:49:31

标签: sql timezone apache-spark-sql dst timestamp-with-timezone

我有一个pyspark数据框。在这个数据框中,我有一个名为test_time的列,其string数据类型

>>> df
DataFrame[test_time: string]

df.show()

+-------------------+
|          test_time|
+-------------------+
|2017-03-12 02:41:06|
|2017-03-12 02:43:52|
|2017-03-12 02:56:32|
|2017-03-12 03:16:23|
|2017-03-12 03:17:15|
|2017-03-12 03:22:19|
|2017-03-12 03:52:19|
|2017-03-12 04:03:21|
+-------------------+

现在我想将此test_time列从string转换为timestamp

我在下面做了

from pyspark.sql import functions as F
df1 = df.withColumn('convert_test', F.unix_timestamp('test_time', "yyyy-MM-dd hh:mm:ss").cast('timestamp'))

>>> df1
DataFrame[test_time: string, convert_test: timestamp]

df1.show()

+-------------------+--------------------+
|          test_time|        convert_test|
+-------------------+--------------------+
|2017-03-12 02:41:06|2017-03-12 03:41:...|
|2017-03-12 02:43:52|2017-03-12 03:43:...|
|2017-03-12 02:56:32|2017-03-12 03:56:...|
|2017-03-12 03:16:23|2017-03-12 03:16:...|
|2017-03-12 03:17:15|2017-03-12 03:17:...|
|2017-03-12 03:22:19|2017-03-12 03:22:...|
|2017-03-12 03:52:19|2017-03-12 03:52:...|
|2017-03-12 04:03:21|2017-03-12 04:03:...|
+-------------------+--------------------+

正如您所看到的,行Hours的{​​{1}}不同。

1-3我的时区为FYI,行PST1-3时间内的时间。

如何才能完成正确的转化。

1 个答案:

答案 0 :(得分:0)

我使用unix_timestamp()

获得正确的输出
  val dataframe = Seq(
    ("2017-03-12 02:41:06"),
    ("2017-03-12 02:43:52"),
    ("2017-03-12 02:56:32"),
    ("2017-03-12 03:16:23"),
    ("2017-03-12 03:17:15"),
    ("2017-03-12 03:22:19"),
    ("2017-03-12 03:52:19"),
    ("2017-03-12 04:03:21")
  ).toDF("test_time")

 dataframe.withColumn("convert_test", unix_timestamp($"test_time", "yyyy-MM-dd hh:mm:ss").cast("timestamp")).show()

输出:

+-------------------+--------------------+
|          test_time|        convert_test|
+-------------------+--------------------+
|2017-03-12 02:41:06|2017-03-12 02:41:...|
|2017-03-12 02:43:52|2017-03-12 02:43:...|
|2017-03-12 02:56:32|2017-03-12 02:56:...|
|2017-03-12 03:16:23|2017-03-12 03:16:...|
|2017-03-12 03:17:15|2017-03-12 03:17:...|
|2017-03-12 03:22:19|2017-03-12 03:22:...|
|2017-03-12 03:52:19|2017-03-12 03:52:...|
|2017-03-12 04:03:21|2017-03-12 04:03:...|
+-------------------+--------------------+

如果您的时区不同,则可以使用from_utc_timestamp()to_utc_timestamp()等功能转换时间戳。

希望这有用!