我有一些代码可以读取镶木地板文件,然后显示它,如下所示:
c = spark.sparkContext
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
lines = sqlContext.read.parquet("hdfs:////home/records/")
lines.take(100)
这很好用,但我想从输出中创建一个CSV文件:
[Row(trans_key=1130, job_id=2005972, rec=1, old_id=833715, amount=2, temp_value=0.55, loc_id=31642),
[Row(trans_key=1230, job_id=2005972, rec=4, old_id=832715, amount=22, temp_value=0.99, loc_id=31642),
[Row(trans_key=1930, job_id=2905972, rec=5, old_id=831715, amount=32, temp_value=0.33, loc_id=31642),
[Row(trans_key=1430, job_id=2705972, rec=6, old_id=833775, amount=20, temp_value=0.10, loc_id=31642),
我希望创建一个包含列标题,逗号分隔数据和数据的CSV文件。像这样:
trans_key,job_id,rec,old_id,amount,temp_value,loc_id
1130,2005972,1,833715,2,0.55,31642
1230,2005972,4,832715,22,0.99,31642
1430,2705972,6,833775,20,0.10,31642
我被困在如何将我的结果从镶木地板文件转换为CSV文件。你能救我吗?
答案 0 :(得分:1)
这应该
lines.repartition(1).write.format(' com.databricks.spark.csv')。save(' path + my.csv',header =&#39真&#39)