在将字符串转换为日期时,其中字符串应仅采用MM-dd-yyyy格式,而字母数字和mm-dd-yy字符串也将转换为日期

时间:2020-05-05 08:34:01

标签: date pyspark

我要验证列中提供的字符串是否符合特定的日期格式。

例如-必需的日期格式为MM-dd-yyyy

输入-

col1
2019-01-01
01-01-2019
01-01-19
01-01-201B

所需的输出-

col1        |col2
------------|-------------
2019-01-01  |null
01-01-2019  |2019-01-01
01-01-19    |null
01-01-201B  |null

但是我在第三和第四行得到的输出是-

col1        |col2
------------|-------------
01-01-19    |0019-01-01
01-01-201B  |0201-01-01

这是示例代码-

import pyspark.sql.functions as sf

a=[("zxczxc AS OF 2019-01-01",),("asasdwer AS OF 01-01-2019",),("ssadflksad AS OF 01-01-20",),("wrongdt AS OF dt------",),
   ("again wrgdt AS OF 01-01-201b",),("crct AS OF 01-01-2019 asdasd",),("asasdwer AU 01-01-2019",)]

df = spark.createDataFrame(a,["col1"])
df = df.withColumn("col2",sf.when(sf.instr("col1","AS OF")!=0,sf.col("col1").substr(sf.instr("col1","AS OF")+6,sf.lit(10))).otherwise("-1"))
df = df.withColumn("col3",sf.when(sf.instr("col1","AS OF")!=0,sf.to_date(sf.col("col1").substr(sf.instr("col1","AS OF")+6,sf.lit(10)),"MM-dd-yyyy")).otherwise("-1"))

df.show()

0 个答案:

没有答案