Spark Dataset write()方法返回错误

时间:2018-02-01 12:48:07

标签: xml csv hadoop apache-spark dataset

我正在尝试使用Databricks库加载XML文件并将数据写入文件,但我无法将输出data(array<string>)写入csv文件。

我收到以下错误:

Exception in thread "main" java.lang.UnsupportedOperationException: CSV data source does not support array<string> data type.

当我打印数据集时,它打印出来像这样:

+--------------------+
|             orgname|
+--------------------+
|[Muncy, Geissler,...|
|[Muncy, Geissler,...|
|[Knobbe Martens O...|
|[null, Telekta La...|
|[McAndrews, Held ...|
|[Notaro, Michalos...|
|                null|
|[Cowan, Liebowitz...|
|                null|
|[Kunzler Law Grou...|
|[null, null, Klei...|
|[Knobbe, Martens,...|
|[Merchant & Gould...|
|                null|
|[Culhane Meadows ...|
|[Culhane Meadows ...|
|[Vista IP Law Gro...|
|[Thompson & Knigh...|
|  [Fish & Tsang LLP]|
|                null|
+--------------------+

1 个答案:

答案 0 :(得分:2)

例外应该是自我解释的。您无法将数组写入'should' => [ [ 'term' => ['field' => null] ], [ 'terms' => ['field' => [1, 2, 3, 4, 5 ,6] ] ] 文件。

您必须将其连接成一个字符串:

CSV