如何在Azure Databricks中创建从Azure Data Lake Store读取的EXTERNAL TABLE?如果有可能的话,我在文档中看到了问题。我在Azure Data lake Store的特定文件夹中有一组CSV文件,我想在Azure Databricks中创建一个指向CSV文件的CREATE EXTERNAL TABLE。
答案 0 :(得分:2)
您可以将Azure Data Lake Store(ADLS)安装到Azure Databricks DBFS(需要4.0运行时或更高版本):
# Get Azure Data Lake Store credentials from the secret store
clientid = dbutils.preview.secret.get(scope = "adls", key = "clientid")
credential = dbutils.preview.secret.get(scope = "adls", key = "credential")
refreshurl = dbutils.preview.secret.get(scope = "adls", key = "refreshurl")
accounturl = dbutils.preview.secret.get(scope = "adls", key = "accounturl")
# Mount the ADLS
configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": clientid,
"dfs.adls.oauth2.credential": credential,
"dfs.adls.oauth2.refresh.url": refreshurl}
dbutils.fs.mount(
source = accounturl,
mount_point = "/mnt/adls",
extra_configs = configs)
表创建的工作方式与DBFS相同。只需使用ADLS中的目录引用mountpoint,例如: G:
%sql
CREATE TABLE product
USING CSV
OPTIONS (header "true", inferSchema "true")
LOCATION "/mnt/adls/productscsv/"
location子句自动暗示EXTERNAL。另请参阅Azure Databricks Documentation。
答案 1 :(得分:0)
您应该考虑查看以下链接:https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html
使用Spark API访问Azure Data Lake Store 要从Data Lake Store帐户中进行读取,您可以将Spark配置为使用服务凭据以及笔记本中的以下代码段:
spark.conf.set(" dfs.adls.oauth2.access.token.provider.type"," ClientCredential") spark.conf.set(" dfs.adls.oauth2.client.id"," {您的服务客户ID}") spark.conf.set(" dfs.adls.oauth2.credential"," {YOUR SERVICE CREDENTIALS}") spark.conf.set(" dfs.adls.oauth2.refresh.url"," https://login.microsoftonline.com/ {你的目录ID} / oauth2 / token")
它没有提到使用外部表。