尝试通过spark会话加载csv但遇到包含双引号和逗号内部字符串的问题.i.e。
"""A"" STAR ACCOUNTING,& TRAINING SOLUTIONS LIMITED"
这将根据上面的字符串创建具有2个不同列的数据框,输出:
"""A"" STAR ACCOUNTING
& TRAINING SOLUTIONS LIMITED"
通过spark session读取csv读取csv
val df = ss.read
.option("header", true)
.option("ignoreLeadingWhiteSpace", "true")
.csv(csvFile)
.sort(id)
无论如何要读取csv文件并在字符串中跳过逗号?
答案 0 :(得分:1)
看起来您的数据使用"
作为转义字符,而默认值为\
。您应该在阅读时提供quote
选项:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.0
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_151)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.read.option("escape", "\"").csv(Seq("\"\"\"A\"\" STAR ACCOUNTING,& TRAINING SOLUTIONS LIMITED").toDS).show(false)
+------------------------------------------------+
|_c0 |
+------------------------------------------------+
|"A" STAR ACCOUNTING,& TRAINING SOLUTIONS LIMITED|
+------------------------------------------------+