我正在尝试使用语法(受documentation启发)从Databricks笔记本中的Azure Data Lake Storage Gen1中读取文件configs = {"dfs.adls.oauth2.access.token.provider.type": "ClientCredential",
"dfs.adls.oauth2.client.id": "123abc-1e42-31415-9265-12345678",
"dfs.adls.oauth2.credential": dbutils.secrets.get(scope = "adla", key = "adlamaywork"),
"dfs.adls.oauth2.refresh.url": "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token"}
dbutils.fs.mount(
source = "adl://myadls.azuredatalakestore.net/mydir",
mount_point = "/mnt/adls",
extra_configs = configs)
post_processed = spark.read.csv("/mnt/adls/mycsv.csv").collect()
post_processed.head(10).to_csv("/dbfs/processed.csv")
dbutils.fs.unmount("/mnt/adls")
123abc-1e42-31415-9265-12345678
我的客户myadls
可以访问Data Lake Storage databricks secrets put --scope adla --key adlamaywork
,并且我已经使用...创建了机密
spark.read.csv
当我在Databricks笔记本中执行上面的pyspark代码时,使用dbfs ls dbfs:/mnt/adls
访问csv文件时,我得到
com.microsoft.azure.datalake.store.ADLException:获取信息时出错 用于文件/mydir/mycsv.csv
使用$dateString = '21 de julio de 2019';
echo 'Given date is '.$dateString;
$tz = reset(iterator_to_array(IntlTimeZone::createEnumeration('ES')));
$formatter = datefmt_create(
'es_ES',
IntlDateFormatter::FULL,
IntlDateFormatter::FULL,
$tz,
IntlDateFormatter::GREGORIAN);
echo 'Formatter '.$formatter;
$parsedDate=datefmt_format.format($formatter, $dateString);
echo 'Parsed date '.$parsedDate;
浏览dbfs时,父挂载点似乎在那里,但是我得到
错误:b'{“错误代码”:“ IO_ERROR”,“消息”:“获取访问权限时出错 令牌\ n在1次尝试后最后一次遇到异常 [HTTP0(null)]“}'
我在做什么错了?
答案 0 :(得分:0)
如果您不一定需要将目录挂载到dbfs中,则可以尝试直接从adls中读取,如下所示:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.access.token.provider", "org.apache.hadoop.fs.adls.oauth2.ConfCredentialBasedAccessTokenProvider")
spark.conf.set("dfs.adls.oauth2.client.id", "123abc-1e42-31415-9265-12345678")
spark.conf.set("dfs.adls.oauth2.credential", dbutils.secrets.get(scope = "adla", key = "adlamaywork"))
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/123456abc-2718-aaaa-9999-42424242abc/oauth2/token")
csvFile = "adl://myadls.azuredatalakestore.net/mydir/mycsv.csv"
df = spark.read.format('csv').options(header='true', inferschema='true').load(csvFile)