无法拆分长破折号的Experience_datesEmployeed列。如何分割字符串或如何从列值中删除长破折号。
我尝试使用UTF-8编码读取文件。
df_final=spark.read.options(header="True",inferSchema='True',delimiter=',').option("encoding", "UTF-8").csv("/path/csv")
试图用Unicode拆分,例如8212,8211,2014。
df_final.withColumn('Splitted', split(df_final['Experience_datesEmployeed'], u'\u2014')[0]).show()
示例CSV文件
fullName,Experience_datesEmployeed,Experience_expcompany,Experience_expduraation, Experience_position
David,Feb 1999 – Sep 2001, Foothill,2 yrs 8 mos, Marketing Assoicate
David,1994 – 1997, abc,3 yrs,Senior Auditor
David,Jun 2020 – Present, Fellows INC,3 mos,Director Board
David,2017 – Jun 2019, Fellows INC ,2 yrs,Fellow - Class 22
David,Sep 2001 – Present, The John D.,19 yrs, Manager