Question

我想像这样将预测结果DataFrame写回oracle数据库： model.transform（testDate）.write.mode（SaveMode.Overwrite）.jdbc（URL，＆＃34; b_spark_tst＆＃34;，丙）

但我收到此错误消息：

Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for array<string>

任何人都可以帮我解决如何将DataFrame写入数据库的问题吗？

谢谢！

更新

这就是我的DataFrame架构的样子：

root
 |-- CATEG: string (nullable = true)
 |-- COMM: string (nullable = true)
 |-- label: double (nullable = true)
 |-- words: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- features: vector (nullable = true)
 |-- rawPrediction: vector (nullable = true)
 |-- probability: vector (nullable = true)
 |-- prediction: double (nullable = true)

Answer 1

我遇到了同样的问题，这与您尝试在单词字段Array中保存元素字段的方式有关。一种解决方案是将该数组保存为String。

就我而言，我有：

 |-- jurisdiction_names: array (nullable = true)
 |    |-- element: string (containsNull = true)

我正在使用pyspark做什么

newDataFrame = completeDataFrame.select("jurisdiction_names")

我正在

+--------------------+
|  jurisdiction_names|
+--------------------+
|             [Paris]|
|         [Amsterdam]|
|      [Santa Monica]|
|[DISTRICT OF COLU...|
|             [Paris]|
|[Illinois State, ...|
+--------------------+

使用新的数据框，您可以轻松操作信息。

Spark写出DataFrame输出jdbc错误

1 个答案: