我在Spark SQL中遇到问题,如果我从字符串到时间戳转换类型为列,则该值变为NULL。详细信息如下:
val df2 = sql("""select FROM_UNIXTIME(UNIX_TIMESTAMP(to_date(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','10','01'),0))),'yyyy-MM-dd'),'yyyyMMdd HH:mm:ss')""")
df2: org.apache.spark.sql.DataFrame = [from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss): string]
scala> df2.show
+----------------------------------------------------------------------------------------------------------------------------------------+
|from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss)|
+----------------------------------------------------------------------------------------------------------------------------------------+
| 20181001 00:00:00|
+----------------------------------------------------------------------------------------------------------------------------------------+
当显式类型转换为时间戳时,它不会给我想要的结果。
val df2 = sql("""select cast(FROM_UNIXTIME(UNIX_TIMESTAMP(to_date(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','10','01'),0))),'yyyy-MM-dd'),'yyyyMMdd HH:mm:ss') as timestamp)""")
df2: org.apache.spark.sql.DataFrame = [CAST(from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss) AS TIMESTAMP): timestamp]
scala> df2.show
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
|CAST(from_unixtime(unix_timestamp(to_date(last_day(add_months(CAST(concat_ws(-, 2018, 10, 01) AS DATE), 0))), yyyy-MM-dd), yyyyMMdd HH:mm:ss) AS TIMESTAMP)|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| null|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
有解决的主意吗?
答案 0 :(得分:1)
请尝试以下操作:
val df2 = spark.sql(
"""select CAST(unix_timestamp(FROM_UNIXTIME(UNIX_TIMESTAMP(to_date(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','10','01'),0))),'yyyy-MM-dd'),'yyyyMMdd HH:mm:ss'),'yyyyMMdd HH:mm:ss') as timestamp) as destination""".stripMargin)
df2.show(false)
df2.printSchema()
+-------------------+
|destination |
+-------------------+
|2018-10-31 00:00:00|
+-------------------+
root
|-- destination: timestamp (nullable = true)
答案 1 :(得分:0)
我这样尝试过,没有使用任何内部火花。
val df2 = sql("""cast(FROM_UNIXTIME(UNIX_TIMESTAMP(cast(LAST_DAY(ADD_MONTHS(CONCAT_WS('-','2018','12','31'),0)) as timestamp))) as timestamp)""")
scala> df2.show
+--------------------+
|2018-12-31 00:00:...|
+--------------------+