我有一个只有一列的数据框,我正在使用databricks csv将该数据帧写入HDFS位置
每一行都有一些尾随空格,我希望在存储该数据帧时包含这些空格
我的火花版本是1.5 CDH 5.5
我正在运行火花作业,如下所示
spark-submit --packages com.databricks:spark-csv_2.10:1.1.0
df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").save(path)
Trailing spaces are included when the above write happens but each line has quotes and start of that line and end of that line
所以我尝试了下面的
df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").option("quote","\u0000").save(path)
starting quotes and ending quotes are removed and the trailing spaces are also removed
所以我尝试了下面的
df.coalesce(1).write.mode(SaveMode.Overwrite).format("com.databricks.spark.csv").option("ignoreLeadingWhiteSpace","false").option("ignoreTrailingWhiteSpace","false").option("quote","\u0000").save(path)
No impact .
starting quotes and ending quotes are removed and the trailing spaces are also removed
我不希望存储引号,但我想要搜索空格
我如何实现这一目标?