pyspark 2 csv读取报价被忽略

时间:2016-09-11 16:49:36

标签: csv apache-spark pyspark apache-spark-sql pyspark-sql

tx = 'a,b,c,"[""d"", ""e""]""'
file=open('temp.csv','wt')
file.writelines(tx)
file.close()


sparkSession.read.csv('temp.csv', quote='"').show()
+---+---+---+-------+---------+
|_c0|_c1|_c2|    _c3|      _c4|
+---+---+---+-------+---------+
|  a|  b|  c|"[""d""| ""e""]""|
+---+---+---+-------+---------+

所需输出

的位置
+---+---+---+-------------------+
|_c0|_c1|_c2|    _c3            |
+---+---+---+-------------------+
|  a|  b|  c|"[""d"", ""e""]""| |
+---+---+---+-------------------+

1 个答案:

答案 0 :(得分:0)

我对PySpark并不熟悉,但引号似乎有问题(一个太多) - 应该是:

'a,b,c,"[""d"", ""e""]"'

然后输出应为:

+---+---+---+-------------------+
|_c0|_c1|_c2|    _c3            |
+---+---+---+-------------------+
|  a|  b|  c|["d", "e"]         |
+---+---+---+-------------------+