我知道为什么我会得到以下结果?
scala> val b = to_timestamp($"DATETIME", "ddMMMYYYY:HH:mm:ss")
b: org.apache.spark.sql.Column = to_timestamp(`DATETIME`, 'ddMMMYYYY:HH:mm:ss')
scala> sourceRawData.withColumn("ts", b).show(6,false)
+------------------+-------------------+-----------+--------+----------------+---------+-------------------+
|DATETIME |LOAD_DATETIME |SOURCE_BANK|EMP_NAME|HEADER_ROW_COUNT|EMP_HOURS|ts |
+------------------+-------------------+-----------+--------+----------------+---------+-------------------+
|01JAN2017:01:02:03|01JAN2017:01:02:03 | RBS | Naveen |100 |15.23 |2017-01-01 01:02:03|
|15MAR2017:01:02:03|15MAR2017:01:02:03 | RBS | Naveen |100 |115.78 |2017-01-01 01:02:03|
|02APR2015:23:24:25|02APR2015:23:24:25 | RBS |Arun |200 |2.09 |2014-12-28 23:24:25|
|28MAY2010:12:13:14| 28MAY2010:12:13:14|RBS |Arun |100 |30.98 |2009-12-27 12:13:14|
|04JUN2018:10:11:12|04JUN2018:10:11:12 |XZX | Arun |400 |12.0 |2017-12-31 10:11:12|
+------------------+-------------------+-----------+--------+----------------+---------+-------------------+
我正在尝试将DATETIME(以ddMMMYY:HH:mm:ss格式)转换为Timestamp(显示在上面的最后一列),但它似乎没有转换为正确的值。 我提到了以下帖子,但没有帮助:
Better way to convert a string field into timestamp in Spark
任何人都可以帮助我?
答案 0 :(得分:3)
使用y
(年)而不是Y
(周年):
spark.sql("SELECT to_timestamp('04JUN2018:10:11:12', 'ddMMMyyyy:HH:mm:ss')").show
// +--------------------------------------------------------+
// |to_timestamp('04JUN2018:10:11:12', 'ddMMMyyyy:HH:mm:ss')|
// +--------------------------------------------------------+
// | 2018-06-04 10:11:12|
// +--------------------------------------------------------+
答案 1 :(得分:-1)
试试这个UDF:
val changeDtFmt = udf{(cFormat: String,
rFormat: String,
date: String) => {
val formatterOld = new SimpleDateFormat(cFormat)
val formatterNew = new SimpleDateFormat(rFormat)
formatterNew.format(formatterOld.parse(date))
}}
sourceRawData.
withColumn("ts",
changeDtFmt(lit("ddMMMyyyy:HH:mm:ss"), lit("yyyy-MM-dd HH:mm:ss"), $"DATETIME")).
show(6,false)
答案 2 :(得分:-1)
尝试以下代码
我为表格
创建了一个示例数据框“df”+---+-------------------+
| id| date|
+---+-------------------+
| 1| 01JAN2017:01:02:03|
| 2| 15MAR2017:01:02:03|
| 3|02APR2015:23:24:25 |
+---+-------------------+
val t_s= unix_timestamp($"date","ddMMMyyyy:HH:mm:ss").cast("timestamp")
df.withColumn("ts",t_s).show()
+---+-------------------+--------------------+
| id| date| ts|
+---+-------------------+--------------------+
| 1| 01JAN2017:01:02:03|2017-01-01 01:02:...|
| 2| 15MAR2017:01:02:03|2017-03-15 01:02:...|
| 3|02APR2015:23:24:25 |2015-04-02 23:24:...|
+---+-------------------+--------------------+
由于