Question

我在 HDFS 中有一个 csv 文件，我正在从 pyspark 读取它，但是在 1 列中我们有带引号的列值，因为其他列获得空值。

df= spark.read.csv("hdfs://ip-10-0-0-178.ec2.internal:8020/user/root/etl_project/part-m-00000", header = False, schema = fileSchema,escape="\"")

它读作：

<头>

年	月	天	工作日	小时	atm_status	atm_id	atm_manufacturer	atm_location	atm_streetname	atm_street_number	atm_zipcode	atm_lat	atm_lon	货币	card_type	服务	message_code	message_text	weather_lat	weather_lon	weather_city_id	weather_city_name	温度	压力	湿度	wind_speed	wind_deg	rain_3h	clouds_all	weather_id	weather_main	weather_description
2017	一月	1	星期日	9	活跃	41	迪堡尼克斯多夫	斯卡恩	圣. Laurentiivej	36	9990	57.7226197	10.5900563	丹麦克朗	万事达卡 - 在线	提现	4014	'疑似故障,0.000,57.721,11,2613939,0.000,277,1005,75,3,290.000,0,80,803,云'	空	空	空	空	空	空	空	空	空	空	空	空	空	空

期望输出

<头>

年	月	天	工作日	小时	atm_status	atm_id	atm_manufacturer	atm_location	atm_streetname	atm_street_number	atm_zipcode	atm_lat	atm_lon	货币	card_type	服务	message_code	message_text	weather_lat	weather_lon	weather_city_id	weather_city_name	温度	压力	湿度	wind_speed	wind_deg	rain_3h	clouds_all	weather_id	weather_main	weather_description
2017	一月	1	星期日	9	活跃	41	迪堡尼克斯多夫	斯卡恩	圣. Laurentiivej	36	9990	57.7226197	10.5900563	丹麦克朗	万事达卡 - 在线	提款	4014	“疑似故障”	57.720928	10.58394	2613939	斯卡恩	277.41	1005	75	3	290		80	803	云	碎云

如何使用 pyspark 数据框正确读取文件以获得所需的输出。