为什么Databricks Python无法从我的Azure Datalake Storage Gen1中读取?

时间:2019-07-25 13:50:00

标签: python pyspark azure-data-lake databricks azure-databricks

我正在尝试使用语法(受documentation启发)从Databricks笔记本中的Azure Data Lake Storage Gen1中读取文件configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential", "dfs.adls.oauth2.client.id": "123abc-1e42-31415-9265-12345678", "dfs.adls.oauth2.credential": dbutils.secrets.get(scope = "adla", key = "adlamaywork"), "dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token"} dbutils.fs.mount( source = "adl://myadls.azuredatalakestore.net/mydir", mount_point = "/mnt/adls", extra_configs = configs) post_processed = spark.read.csv("/mnt/adls/mycsv.csv").collect() post_processed.head(10).to_csv("/dbfs/processed.csv") dbutils.fs.unmount("/mnt/adls")

123abc-1e42-31415-9265-12345678

我的客户myadls可以访问Data Lake Storage databricks secrets put --scope adla --key adlamaywork ,并且我已经使用...创建了机密

spark.read.csv

当我在Databricks笔记本中执行上面的pyspark代码时,使用dbfs ls dbfs:/mnt/adls访问csv文件时,我得到

  

com.microsoft.azure.datalake.store.ADLException:获取信息时出错   用于文件/mydir/mycsv.csv

使用$dateString = '21 de julio de 2019'; echo 'Given date is '.$dateString; $tz = reset(iterator_to_array(IntlTimeZone::createEnumeration('ES'))); $formatter = datefmt_create( 'es_ES', IntlDateFormatter::FULL, IntlDateFormatter::FULL, $tz, IntlDateFormatter::GREGORIAN); echo 'Formatter '.$formatter; $parsedDate=datefmt_format.format($formatter, $dateString); echo 'Parsed date '.$parsedDate; 浏览dbfs时,父挂载点似乎在那里,但是我得到

  

错误:b'{“错误代码”:“ IO_ERROR”,“消息”:“获取访问权限时出错   令牌\ n在1次尝试后最后一次遇到异常   [HTTP0(null)]“}'

我在做什么错了?

1 个答案:

答案 0 :(得分:0)

如果您不一定需要将目录挂载到dbfs中,则可以尝试直接从adls中读取,如下所示:

spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.access.token.provider", "org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider")
spark.conf.set("dfs.adls.oauth2.client.id", "123abc-1e42-31415-9265-12345678")
spark.conf.set("dfs.adls.oauth2.credential", dbutils.secrets.get(scope = "adla", key = "adlamaywork"))
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token")

csvFile = "adl://myadls.azuredatalakestore.net/mydir/mycsv.csv"

df = spark.read.format('csv').options(header='true', inferschema='true').load(csvFile)