Question

我正在尝试删除使用writestream创建的三角洲湖泊表。我尝试了删除表，但是失败了

#table created as
df.writestream().outputmode("append").format("delta").start("/mnt/mytable")

#attempt to drop table
spark.sql("drop table '/mnt/mytable'")

Answer 1

请确保您使用的架构正确，因为即使删除表，数据仍将驻留在DDL中定义的路径中。因此，如果重新运行，它将推断出过去的架构。在这种情况下，您可能想使用％fs ls / mnt / data / blah / blah / blah 删除文件或在它们上显示视觉效果，如果您知道使用在做什么，则可以删除它们％fs rm -r / mnt / data / that / blah / path / here 。

Answer 2

DROP TABLE IF EXISTS <unmanaged-table>    // deletes the metadata
dbutils.fs.rm("<your-s3-path>", true)   // deletes the data

DROP TABLE <managed-table> // deletes the metadata and the data

您需要指定数据以删除非托管表中的数据，因为对于非托管表； Spark SQL仅管理元数据，您可以控制数据位置。使用托管表，Spark SQL可以管理元数据和数据，并且数据存储在您帐户的Databricks文件系统（DBFS）中。因此，要删除非托管表的数据，需要tp指定数据的路径。

如何删除非托管的三角洲湖泊表

2 个答案: