Spark csv databricks删除每行中的尾随空格

时间:2018-02-21 15:13:51

标签: apache-spark

我有一个只有一列的数据框,我正在使用databricks csv将该数据帧写入HDFS位置

每一行都有一些尾随空格,我希望在存储该数据帧时包含这些空格

我的火花版本是1.5 CDH 5.5

我正在运行火花作业,如下所示

 spark-submit --packages com.databricks:spark-csv_2.10:1.1.0 


df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").save(path)

Trailing spaces are included when the above write happens but each line has quotes and start of that line and end of that line 

所以我尝试了下面的

df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").option("quote","\u0000").save(path)

starting quotes and ending quotes are removed and the trailing spaces are also removed 

所以我尝试了下面的

df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").option("ignoreLeadingWhiteSpace","false").option("ignoreTrailingWhiteSpace","false").option("quote","\u0000").save(path)

No impact . 
starting quotes and ending quotes are removed and the trailing spaces are also removed 

我不希望存储引号,但我想要搜索空格

我如何实现这一目标?

0 个答案:

没有答案