Question

我有一个只有一列的数据框，我正在使用databricks csv将该数据帧写入HDFS位置

每一行都有一些尾随空格，我希望在存储该数据帧时包含这些空格

我的火花版本是1.5 CDH 5.5

我正在运行火花作业，如下所示

 spark-submit --packages com.databricks:spark-csv_2.10:1.1.0 


df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").save(path)

Trailing spaces are included when the above write happens but each line has quotes and start of that line and end of that line

所以我尝试了下面的

df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").option("quote","\u0000").save(path)

starting quotes and ending quotes are removed and the trailing spaces are also removed

所以我尝试了下面的

df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").option("ignoreLeadingWhiteSpace","false").option("ignoreTrailingWhiteSpace","false").option("quote","\u0000").save(path)

No impact . 
starting quotes and ending quotes are removed and the trailing spaces are also removed

我不希望存储引号，但我想要搜索空格

我如何实现这一目标？

Spark csv databricks删除每行中的尾随空格

0 个答案: