我有以下数据集:
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST |920799 |loyalty |
|TEST |922428 |loyalty |
|TEST |2063890|loyalty |
|TEST |2344814|loyalty |
|TEST |2355426|loyalty |
|TEST |2618707|loyalty |
+-------------------+-------+------------+
我使用以下脚本将上表写入s3
路径:
df.write.option("header","true").mode("overwrite").csv("<s3: path>")
但是,当尝试读取表以进行进一步的操作时,该表如下所示:
+-------------------+-------+------------+
| _c0| _c1| _c2|
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
|TEST |920799 |loyalty |
|TEST |922428 |loyalty |
|TEST |2063890|loyalty |
|TEST |2344814|loyalty |
|TEST |2355426|loyalty |
|TEST |2618707|loyalty |
+-------------------+-------+------------+
我希望表格在哪里:
+-------------------+-------+------------+
|test_control_status|user_id|loyalty_type|
+-------------------+-------+------------+
|TEST |920799 |loyalty |
|TEST |922428 |loyalty |
|TEST |2063890|loyalty |
|TEST |2344814|loyalty |
|TEST |2355426|loyalty |
|TEST |2618707|loyalty |
+-------------------+-------+------------+
我尝试以parquet
格式写入文件,但是它可以工作,但是我只想以.csv
格式写入文件。
任何帮助或提示将不胜感激。
答案 0 :(得分:0)
这应该做,
sqlContext.read.csv("s3:///file_path", header = True)