Spark SQL:字符串到时间戳的转换:值更改为NULL

时间:2019-02-27 02:09:45

标签: apache-spark-sql

我在Spark SQL中遇到问题,如果我从字符串到时间戳转换类型为列,则该值变为NULL。详细信息如下:

val df2 = sql("""select FROM_UNIXTIME(UNIX_TIMESTAMP(to_date(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','10','01'),0))),'yyyy-MM-dd'),'yyyyMMdd HH:mm:ss')""")
df2: org.apache.spark.sql.DataFrame = [from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss): string]


scala> df2.show
+----------------------------------------------------------------------------------------------------------------------------------------+
|from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss)|
+----------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                       20181001 00:00:00|
+----------------------------------------------------------------------------------------------------------------------------------------+

当显式类型转换为时间戳时,它不会给我想要的结果。

val df2 = sql("""select cast(FROM_UNIXTIME(UNIX_TIMESTAMP(to_date(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','10','01'),0))),'yyyy-MM-dd'),'yyyyMMdd HH:mm:ss') as timestamp)""")
df2: org.apache.spark.sql.DataFrame = [CAST(from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss) AS TIMESTAMP): timestamp]


scala> df2.show
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|CAST(from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss) AS TIMESTAMP)|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                                                                                                       null|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+

有解决的主意吗?

2 个答案:

答案 0 :(得分:1)

请尝试以下操作:

val df2 = spark.sql(
      """select CAST(unix_timestamp(FROM_UNIXTIME(UNIX_TIMESTAMP(to_date(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','10','01'),0))),'yyyy-MM-dd'),'yyyyMMdd HH:mm:ss'),'yyyyMMdd HH:mm:ss') as timestamp) as destination""".stripMargin)

df2.show(false)
df2.printSchema()

+-------------------+
|destination        |
+-------------------+
|2018-10-31 00:00:00|
+-------------------+

root
 |-- destination: timestamp (nullable = true)

答案 1 :(得分:0)

我这样尝试过,没有使用任何内部火花。

val df2 = sql("""cast(FROM_UNIXTIME(UNIX_TIMESTAMP(cast(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','12','31'),0)) as timestamp))) as timestamp)""")

scala> df2.show
+--------------------+
|2018-12-31 00:00:...|
+--------------------+