将字符串转换为日期,格式为2020-04-21T11:28:40.321328 + 00:00

时间:2020-04-21 11:40:41

标签: apache-spark pyspark apache-spark-sql

im与pyspark一起使用Spark结构化流媒体。

我有一个具有以下格式的字符串:

2020-04-21T11:28:40.321328+00:00

我需要将日期格式更改为yyyy-MM-dd HH:mm:ss,我正在尝试这样做:

date_format(to_timestamp('value.Ticker.time', "yyyy-MM-dd'T'HH:mm:ss.sssssssZ"), "yyyy-MM-dd HH:mm:ss")

但结果为空:

我的代码是:

BytesDF_Data_Level_2 = spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "data_level_2") \
  .load()

StringDF_Data_Level_2 = BytesDF_Data_Level_2.selectExpr("CAST(value AS STRING)")
JsonDF_Data_Level_2 = StringDF_Data_Level_2.withColumn("value", from_json("value", schema_data_level_II))
JsonDF_cols_Data_Level_2 = JsonDF_Data_Level_2.select(
    #col('value.Ticker.contract.Forex.tradingClass'),
    col('value.Ticker.time'),
    date_format(to_timestamp('value.Ticker.time', "yyyy-MM-dd'T'HH:mm:ss.sssssssZ"), "yyyy-MM-dd HH:mm:ss")

    #col('value.Ticker.bid'),
    #col('value.Ticker.bidSize'),
    #col('value.Ticker.ask'),
    #col('value.Ticker.askSize')
    )

query = JsonDF_cols_Data_Level_2.\
    writeStream\
    .outputMode("append")\
    .format("console") \
    .option("truncate", "false") \
    .start()

query.awaitTermination()

谢谢!

1 个答案:

答案 0 :(得分:0)

尝试使用格式为to_timestamp from_unixtime(unix_timestamp()) (或) "yyyy-MM-dd'T'HH:mm:ss" 功能

df.withColumn("new_time", to_timestamp(col("time"),"yyyy-MM-dd'T'HH:mm:ss")).show(10,False)
#+--------------------------------+-------------------+
#|time                            |new_time           |
#+--------------------------------+-------------------+
#|2020-04-21T11:28:40.321328+00:00|2020-04-21 11:28:40|
#+--------------------------------+-------------------+

#using date_format
df.withColumn("new_time", date_format(to_timestamp(col("time"),"yyyy-MM-dd'T'HH:mm:ss"),"yyyy-MM-dd HH:mm:ss")).show(10,False)
#+--------------------------------+-------------------+
#|time                            |new_time           |
#+--------------------------------+-------------------+
#|2020-04-21T11:28:40.321328+00:00|2020-04-21 11:28:40|
#+--------------------------------+-------------------+

#using from_unixtime, unix_timestamp functions
df.withColumn("new_time", from_unixtime(unix_timestamp(col("time"),"yyyy-MM-dd'T'HH:mm:ss"),"yyyy-MM-dd HH:mm:ss")).show(10,False)
#+--------------------------------+-------------------+
#|time                            |new_time           |
#+--------------------------------+-------------------+
#|2020-04-21T11:28:40.321328+00:00|2020-04-21 11:28:40|
#+--------------------------------+-------------------+

For Spark-3:

df.withColumn("new_time",to_timestamp((col('time').substr(1, 19)) ,"yyyy-MM-dd'T'HH:mm:ss")).show(10,False)