Spark SQL中将Timestamp
列转换为毫秒时间戳Long
列的最短,最有效的方法是什么?
这是从时间戳到毫秒的转换示例
scala> val ts = spark.sql("SELECT now() as ts")
ts: org.apache.spark.sql.DataFrame = [ts: timestamp]
scala> ts.show(false)
+-----------------------+
|ts |
+-----------------------+
|2019-06-18 12:32:02.41 |
+-----------------------+
scala> val tss = ts.selectExpr(
| "ts",
| "BIGINT(ts) as seconds_ts",
| "BIGINT(ts) * 1000 + BIGINT(date_format(ts, 'S')) as millis_ts"
| )
tss: org.apache.spark.sql.DataFrame = [ts: timestamp, seconds_ts: bigint ... 1 more field]
scala> tss.show(false)
+----------------------+----------+-------------+
|ts |seconds_ts|millis_ts |
+----------------------+----------+-------------+
|2019-06-18 12:32:02.41|1560861122|1560861122410|
+----------------------+----------+-------------+
如您所见,从时间戳获取毫秒的最直接方法不起作用-强制转换为long返回秒,但是保留了时间戳中的毫秒信息。
我发现提取毫秒信息的唯一方法是使用date_format
函数,这并不像我期望的那么简单。
有人知道比Timestamp
列中的毫秒数UNIX时间更简单的方法吗?
答案 0 :(得分:0)
根据 Spark 的 DateTimeUtils 上的代码:
<块引用>“时间戳在外部公开为 java.sql.Timestamp
,内部存储为 longs
,能够以微秒精度存储时间戳。”
因此,如果您定义一个以 java.sql.Timestamp
作为输入的 UDF,您可以简单地调用 getTime
以获得以毫秒为单位的 Long。
val tsConversionToLongUdf = udf((ts: java.sql.Timestamp) => ts.getTime)
将此应用于各种时间戳:
val df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.111", "2017-01-18 11:00:00.110", "2017-01-18 11:00:00.100")
.toDF("timestampString")
.withColumn("timestamp", to_timestamp(col("timestampString")))
.withColumn("timestampConversionToLong", tsConversionToLongUdf(col("timestamp")))
.withColumn("timestampCastAsLong", col("timestamp").cast(LongType))
df.printSchema()
df.show(false)
// returns
root
|-- timestampString: string (nullable = true)
|-- timestamp: timestamp (nullable = true)
|-- timestampConversionToLong: long (nullable = false)
|-- timestampCastAsLong: long (nullable = true)
+-----------------------+-----------------------+-------------------------+-------------------+
|timestampString |timestamp |timestampConversionToLong|timestampCastAsLong|
+-----------------------+-----------------------+-------------------------+-------------------+
|2017-01-18 11:00:00.000|2017-01-18 11:00:00 |1484733600000 |1484733600 |
|2017-01-18 11:00:00.111|2017-01-18 11:00:00.111|1484733600111 |1484733600 |
|2017-01-18 11:00:00.110|2017-01-18 11:00:00.11 |1484733600110 |1484733600 |
|2017-01-18 11:00:00.100|2017-01-18 11:00:00.1 |1484733600100 |1484733600 |
+-----------------------+-----------------------+-------------------------+-------------------+
请注意,“timestampCastAsLong”列仅表明直接转换为 Long
不会以毫秒为单位返回所需的结果,而只会以秒为单位返回。