Question

我正在尝试将数据加载到hive中，同时将数据加载到hive表中，但遇到错误“ java.lang.RuntimeException：为jdbc找到多个源”，任何帮助都将是可贵的。

val url1="jdbc:hive2://xxxxxx.google.com:10000/jkl_cak_coh_batch;principal=hive/xxxxxx.google.com@internal.lllglobal.com;mapred.job.queue.name=io9;AuthMech=3;SSL=1;" +
          "SSLTrustStore=/usr/java/jdk1.8.0_144/jre/lib/security/oooacerts;user=xxxx;password=yyyyy"


val connectionProperties = new Properties()
        connectionProperties.put("user", "xxxxxx")
        connectionProperties.put("password", "xxxxxx")

   sparkSession.sqlContext.sql("select * from " + tmpTable )
    .write
    .format("org.apache.spark.sql.execution.datasources.jdbc.DefaultSource")
    .mode(SaveMode.Append) // <--- Append in existing table
    .option("driver", driverName)
    .option("header","false")
    .jdbc(url1, "sourceTable", connectionProperties)

错误：

java.lang.RuntimeException: Multiple sources found for jdbc (org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider, org.apache.spark.sql.execution.datasources.jdbc.DefaultSource), please specify the fully qualified class name.
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:591)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
        at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:424)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
        at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
        at com.rxxxrp.opanada_launus.LoadData.loadDFToDB(LoadData.scala:140)
        at com.rxxxrp.opanada_launus.jkanadaTaunus$.main(Tanadaaunus.scala:139)
        at com.rxxxrp.jkanada_tllunus.opanadalaunus.main(Tanadaaunus.scala)

Answer 1

删除：

jdbc

这不仅是不必要的（您使用format方法）而且是错误的（没有指定具体实现）

通常，如果您只想使用.format("jdbc")选项

{{1}}

Answer 2

该错误清楚地表明您的集群有两个使用 jdbc 格式的不同类。因此，当您指定 json 时，您需要提供完全限定的类名，以便代码知道要使用哪个类。

您可以根据需要使用其中一个课程。

const client = axios.create({ baseURL, json: true })

client.interceptors.response.use(undefined, async (err) => {
    return Promise.reject(err)
})

或

.format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")

如果你只使用 jdbc，它无法识别它应该使用哪个类，并抛出你看到的错误。

java.lang.RuntimeException：为jdbc找到了多个源

2 个答案: