我要验证列中提供的字符串是否符合特定的日期格式。
例如-必需的日期格式为MM-dd-yyyy
输入-
col1
2019-01-01
01-01-2019
01-01-19
01-01-201B
所需的输出-
col1 |col2
------------|-------------
2019-01-01 |null
01-01-2019 |2019-01-01
01-01-19 |null
01-01-201B |null
但是我在第三和第四行得到的输出是-
col1 |col2
------------|-------------
01-01-19 |0019-01-01
01-01-201B |0201-01-01
这是示例代码-
import pyspark.sql.functions as sf
a=[("zxczxc AS OF 2019-01-01",),("asasdwer AS OF 01-01-2019",),("ssadflksad AS OF 01-01-20",),("wrongdt AS OF dt------",),
("again wrgdt AS OF 01-01-201b",),("crct AS OF 01-01-2019 asdasd",),("asasdwer AU 01-01-2019",)]
df = spark.createDataFrame(a,["col1"])
df = df.withColumn("col2",sf.when(sf.instr("col1","AS OF")!=0,sf.col("col1").substr(sf.instr("col1","AS OF")+6,sf.lit(10))).otherwise("-1"))
df = df.withColumn("col3",sf.when(sf.instr("col1","AS OF")!=0,sf.to_date(sf.col("col1").substr(sf.instr("col1","AS OF")+6,sf.lit(10)),"MM-dd-yyyy")).otherwise("-1"))
df.show()