将数据写入Azure数据块中的Delta Lake时出现问题(检测到不兼容的格式)

时间:2019-07-16 08:21:28

标签: databricks azure-databricks delta-lake

我需要将数据集读取到DataFrame中,然后将数据写入Delta Lake。但是我有以下例外:

AnalysisException: 'Incompatible format detected.\n\nYou are trying to write to `dbfs:/user/class@azuredatabrickstraining.onmicrosoft.com/delta/customer-data/` using Databricks Delta, but there is no\ntransaction log present. Check the upstream job to make sure that it is writing\nusing format("delta") and that you are trying to write to the table base path.\n\nTo disable this check, SET spark.databricks.delta.formatCheck.enabled=false\nTo learn more about Delta, see https://docs.azuredatabricks.net/delta/index.html\n;

以下是异常之前的代码:

from pyspark.sql.types import StructType, StructField, DoubleType, IntegerType, StringType

inputSchema = StructType([
  StructField("InvoiceNo", IntegerType(), True),
  StructField("StockCode", StringType(), True),
  StructField("Description", StringType(), True),
  StructField("Quantity", IntegerType(), True),
  StructField("InvoiceDate", StringType(), True),
  StructField("UnitPrice", DoubleType(), True),
  StructField("CustomerID", IntegerType(), True),
  StructField("Country", StringType(), True)
])

rawDataDF = (spark.read
  .option("header", "true")
  .schema(inputSchema)
  .csv(inputPath)
)

# write to Delta Lake
rawDataDF.write.mode("overwrite").format("delta").partitionBy("Country").save(DataPath) 

1 个答案:

答案 0 :(得分:0)

此错误消息告诉您目标路径上已经有数据(在这种情况下为dbfs:/user/class@azuredatabrickstraining.onmicrosoft.com/delta/customer-data/),并且该数据不是Delta格式的(即没有事务日志)。您可以选择一个新路径(基于上面的注释,看起来像您所做的那样),也可以删除该目录并重试。