Question

如何将时间戳列转换为纪元秒？

var df = sc.parallelize(Seq("2018-07-01T00:00:00Z")).toDF("date_string")
df = df.withColumn("timestamp", $"date_string".cast("timestamp"))
df.show(false)

DataFrame：

+--------------------+---------------------+
|date_string         |timestamp            |
+--------------------+---------------------+
|2018-07-01T00:00:00Z|2018-07-01 00:00:00.0|
+--------------------+---------------------+

Answer 1

如果您有时间戳记，可以将其转换为很长的时间以获取纪元秒数

df = df.withColumn("epoch_seconds", $"timestamp".cast("long"))
df.show(false)

DataFrame

+--------------------+---------------------+-------------+
|date_string         |timestamp            |epoch_seconds|
+--------------------+---------------------+-------------+
|2018-07-01T00:00:00Z|2018-07-01 00:00:00.0|1530403200   |
+--------------------+---------------------+-------------+

Answer 2

使用org.apache.spark.functions中的unix_timestamp。它可以是一个timestamp列，也可以是一个可以指定格式的字符串列。从文档中：

public static Column unix_timestamp(Column s)
使用默认时区和默认语言环境将格式为yyyy-MM-dd HH：mm：ss的时间字符串转换为Unix时间戳（以秒为单位），如果失败，则返回null。
public static Column unix_timestamp(Column s, String p)
将具有给定模式（请参见http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html）的时间字符串转换为Unix时间戳（以秒为单位），如果失败，则返回null。

用法如下：

import org.apache.spark.functions._

df.withColumn("epoch_seconds", unix_timestamp($"timestamp")))

或如果列是其他格式的字符串：

df.withColumn("epoch_seconds", unix_timestamp($"date_string", "yyyy-MM-dd'T'HH:mm:ss'Z'")))

Answer 3

可以使用Spark SQL中的unix_timestamp函数来轻松实现，例如：

spark.sql("SELECT unix_timestamp(inv_time) AS time_as_long FROM agg_counts LIMIT 10").show()

希望这会有所帮助。

Answer 4

您可以使用函数unix_timestamp并将其转换为任何数据类型。

示例：

val df1 = df.select(unix_timestamp($"date_string", "yyyy-MM-dd HH:mm:ss").cast(LongType).as("epoch_seconds"))

如何将时间戳列转换为纪元秒？

4 个答案: