Azure Data Lake Store作为Databricks中的EXTERNAL TABLE

时间:2018-03-28 19:13:20

标签: azure azure-storage azure-data-lake databricks

如何在Azure Databricks中创建从Azure Data Lake Store读取的EXTERNAL TABLE?如果有可能的话,我在文档中看到了问题。我在Azure Data lake Store的特定文件夹中有一组CSV文件,我想在Azure Databricks中创建一个指向CSV文件的CREATE EXTERNAL TABLE。

2 个答案:

答案 0 :(得分:2)

您可以将Azure Data Lake Store(ADLS)安装到Azure Databricks DBFS(需要4.0运行时或更高版本):

    # Get Azure Data Lake Store credentials from the secret store
    clientid = dbutils.preview.secret.get(scope = "adls", key = "clientid")
    credential = dbutils.preview.secret.get(scope = "adls", key = "credential")
    refreshurl = dbutils.preview.secret.get(scope = "adls", key = "refreshurl")
     accounturl = dbutils.preview.secret.get(scope = "adls", key = "accounturl")

    # Mount the ADLS
    configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
       "dfs.adls.oauth2.client.id": clientid,
       "dfs.adls.oauth2.credential": credential,
       "dfs.adls.oauth2.refresh.url": refreshurl}

    dbutils.fs.mount(
       source = accounturl,
       mount_point = "/mnt/adls",
       extra_configs = configs)

表创建的工作方式与DBFS相同。只需使用ADLS中的目录引用mountpoint,例如: G:

    %sql 
    CREATE TABLE product
    USING CSV
    OPTIONS (header "true", inferSchema "true")
    LOCATION "/mnt/adls/productscsv/"

location子句自动暗示EXTERNAL。另请参阅Azure Databricks Documentation

答案 1 :(得分:0)

您应该考虑查看以下链接:https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-datalake.html

使用Spark API访问Azure Data Lake Store 要从Data Lake Store帐户中进行读取,您可以将Spark配置为使用服务凭据以及笔记本中的以下代码段:

  

spark.conf.set(" dfs.adls.oauth2.access.token.provider.type"," ClientCredential")   spark.conf.set(" dfs.adls.oauth2.client.id"," {您的服务客户ID}")   spark.conf.set(" dfs.adls.oauth2.credential"," {YOUR SERVICE CREDENTIALS}")   spark.conf.set(" dfs.adls.oauth2.refresh.url"," https://login.microsoftonline.com/ {你的目录ID} / oauth2 / token")

它没有提到使用外部表。