火花日期格式MMM dd,yyyy hh:mm:ss AM为df中的时间戳

时间:2018-11-14 16:26:03

标签: apache-spark apache-spark-sql

我需要将描述性日期格式从日志文件“ MMM dd,yyyy hh:mm:ss AM / PM”转换为spark时间戳数据类型。我尝试了类似下面的操作,但是它给出了null。

val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times")
df.withColumn("time2",date_format('times,"MMM dd, yyyy HH:mm:ss AM")).show(false)

+------------------------+-----+
|times                   |time2|
+------------------------+-----+
|Nov 05, 2018 02:46:47 AM|null |
|Nov 5, 2018 02:46:47 PM |null |
+------------------------+-----+

预期产量

+------------------------+----------------------------+
|times                   |time2                       |
+------------------------+-----+----------------------+
|Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000" |
|Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000" |
+------------------------+-----+----------------------+

转换此格式的正确格式是什么?请注意,DD可能带有前导零。

3 个答案:

答案 0 :(得分:2)

这是你的答案

val df = Seq(("Nov 05, 2018 02:46:47 AM"),("Nov 5, 2018 02:46:47 PM")).toDF("times")

scala> df.withColumn("times2", from_unixtime(unix_timestamp(col("times"), "MMM d, yyyy hh:mm:ss a"),"yyyy-MM-dd HH:mm:ss.SSSSSS")).show(false)
    +------------------------+--------------------------+
    |times                   |times2                    |
    +------------------------+--------------------------+
    |Nov 05, 2018 02:46:47 AM|2018-11-05 02:46:47.000000|
    |Nov 5, 2018 02:46:47 PM |2018-11-05 14:46:47.000000|
    +------------------------+--------------------------+

如果要解析12小时格式,请使用hh代替hh。解析时,后缀“ a”表示am / pm。

希望这会有所帮助!

答案 1 :(得分:1)

使用to_timestamp和date_format函数

FROM debian:stretch
RUN apt-get update && \
  apt-get install --no-install-recommends --no-install-suggests -y \
  gcc python3 python3-dev python3-pip python3-setuptools
RUN pip3 install wheel
RUN pip3 install uwsgi

答案 2 :(得分:0)

使用SQL语法:

select date_format(to_timestamp(ColumnTimestamp, "MM/dd/yyyy hh:mm:ss aa"), "yyyy-MM-dd") as ColumnDate 
from database_name.table_name