Question

我正在尝试从配置单元表向Azure SQL DB表插入数据。 SQL DB表已经存在，我只想使用以下Scala JDBC编写代码将数据覆盖其中。这段代码正在将数据写入SQL DB表，但是正在更改其DDL（数据类型/列名）。我该如何避免。我想在桌子上简单插入。

Answer 1

您可以看到以下文档：Connecting to SQL Databases using JDBC。它提供了一些有关write data to JDBC的示例。

本节显示如何从名为Diamonds的现有Spark SQL表中将数据写入数据库。

%sql -- quick test that this test table exists
select * from diamonds limit 5

以下代码将数据保存到名为diamonds的数据库表中。使用保留关键字的列名可能会触发异常。示例表具有名为table的列，因此在将其推送到JDBC API之前，可以使用withColumnRenamed（）对其进行重命名。

spark.table("diamonds").withColumnRenamed("table", "table_number")
     .write
     .jdbc(jdbcUrl, "diamonds", connectionProperties)

Spark使用从DataFrame架构确定的适当架构自动创建数据库表。

默认行为是创建一个新表，如果已经存在同名表，则抛出错误消息。您可以使用Spark SQL SaveMode功能更改此行为。例如，以下是向表格添加更多行的方法：

import org.apache.spark.sql.SaveMode

spark.sql("select * from diamonds limit 10").withColumnRenamed("table", "table_number")
     .write
     .mode(SaveMode.Append) // <--- Append to the existing table
     .jdbc(jdbcUrl, "diamonds", connectionProperties)

您还可以覆盖现有表：

spark.table("diamonds").withColumnRenamed("table", "table_number")
     .write
     .mode(SaveMode.Overwrite) // <--- Overwrite the existing table
     .jdbc(jdbcUrl, "diamonds", connectionProperties)

希望这会有所帮助。

使用Scala代码Databricks笔记本使用JDBC连接到Azure SQL DB写入数据

1 个答案: