无法将带有标头的表写入pyspark中的s3路径?

时间:2019-04-29 12:13:17

标签: pyspark pyspark-sql

我有以下数据集:

+-------------------+-------+------------+                                      
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST               |920799 |loyalty     |
|TEST               |922428 |loyalty     |
|TEST               |2063890|loyalty     |
|TEST               |2344814|loyalty     |
|TEST               |2355426|loyalty     |
|TEST               |2618707|loyalty     |
+-------------------+-------+------------+

我使用以下脚本将上表写入s3路径:

df.write.option("header","true").mode("overwrite").csv("<s3: path>")

但是,当尝试读取表以进行进一步的操作时,该表如下所示:

+-------------------+-------+------------+                                      
|                _c0|    _c1|         _c2|
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
|TEST               |920799 |loyalty     |
|TEST               |922428 |loyalty     |
|TEST               |2063890|loyalty     |
|TEST               |2344814|loyalty     |
|TEST               |2355426|loyalty     |
|TEST               |2618707|loyalty     |
+-------------------+-------+------------+

我希望表格在哪里:

+-------------------+-------+------------+                                      
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST               |920799 |loyalty     |
|TEST               |922428 |loyalty     |
|TEST               |2063890|loyalty     |
|TEST               |2344814|loyalty     |
|TEST               |2355426|loyalty     |
|TEST               |2618707|loyalty     |
+-------------------+-------+------------+

我尝试以parquet格式写入文件,但是它可以工作,但是我只想以.csv格式写入文件。 任何帮助或提示将不胜感激。

1 个答案:

答案 0 :(得分:0)

这应该做,

sqlContext.read.csv("s3:///file_path", header = True)