我正在尝试将bulkCopyToSql数据库函数用于带有发现here的sql spark连接器的Microsoft sql server jdbc驱动程序。
这是启动spark shell的语法:
spark-shell --jars /developer/sqljdbc_6.4/enu/mssql-jdbc-6.4.0.jre8.jar,/developer/azure-sqldb-spark-master/azure-sqldb-spark-master/target/azure-sqldb-spark-1.0.0-jar-with-dependencies.jar
创建了bulkCopyConfig并在spark外壳中运行了以下代码行之后,当我运行以下命令时会生成错误:
df.bulkCopyToSqlDB(bulkCopyConfig)
完整的错误消息是:
Caused by: java.lang.NoSuchMethodError: com.microsoft.sqlserver.jdbc.SQLServerBulkCopyOptions.setAllowEncryptedValueModifications(Z)V
at com.microsoft.azure.sqldb.spark.bulk.BulkCopyUtils$.getBulkCopyOptions(BulkCopyUtils.scala:109)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions.com$microsoft$azure$sqldb$spark$connect$DataFrameFunctions$$bulkCopy(DataFrameFunctions.scala:126)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:929)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
答案 0 :(得分:2)
这是由于mssql服务器jar的旧版本所致。确保将其更新到6.4.0或更高版本。如果您使用的是Maven,它将看起来像这样:
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>6.4.0.jre8</version>
</dependency>
请注意,以前将人工制品ID称为sqljdbc42
,而不是mssql-jdbc
。