我能够从Azure Databricks上运行的笔记本计算机连接到ADLS gen2,但是无法通过罐子从作业进行连接。除了使用dbutils之外,我使用了与笔记本中相同的设置。
我在Scala代码中为笔记本中的Spark conf使用了相同的设置。
笔记本:
spark.conf.set(
"fs.azure.account.key.xxxx.dfs.core.windows.net",
dbutils.secrets.get(scope = "kv-secrets", key = "xxxxxx"))
spark.conf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "true")
spark.conf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "false")
val rdd = sqlContext.read.format
("csv").option("header",
"true").load(
"abfss://catalogs@xxxx.dfs.core.windows.net/test/sample.csv")
// Convert rdd to data frame using toDF; the following import is
//required to use toDF function.
val df: DataFrame = rdd.toDF()
// Write file to parquet
df.write.parquet
("abfss://catalogs@xxxx.dfs.core.windows.net/test/Sales.parquet")
Scala代码:
val sc = SparkContext.getOrCreate()
val spark = SparkSession.builder().getOrCreate()
sc.getConf.setAppName("Test")
sc.getConf.set("fs.azure.account.key.xxxx.dfs.core.windows.net",
"<actual key>")
sc.getConf.set("fs.azure.account.auth.type", "OAuth")
sc.getConf.set("fs.azure.account.oauth.provider.type",
"org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
sc.getConf.set("fs.azure.account.oauth2.client.id", "<app id>")
sc.getConf.set("fs.azure.account.oauth2.client.secret", "<app password>")
sc.getConf.set("fs.azure.account.oauth2.client.endpoint",
"https://login.microsoftonline.com/<tenant id>/oauth2/token")
sc.getConf.set
("fs.azure.createRemoteFileSystemDuringInitialization", "false")
val sqlContext = spark.sqlContext
val rdd = sqlContext.read.format
("csv").option("header",
"true").load
("abfss://catalogs@xxxx.dfs.core.windows.net/test/sample.csv")
// Convert rdd to data frame using toDF; the following import is
//required to use toDF function.
val df: DataFrame = rdd.toDF()
println(df.count())
// Write file to parquet
df.write.parquet
("abfss://catalogs@xxxx.dfs.core.windows.net/test/Sales.parquet")
我期望镶木地板文件被写入。相反,我得到以下错误: 20年4月19日13:58:40错误从用户代码捕获未抛出:配置属性xxxx.dfs.core.windows.net未找到。 在shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AbfsConfiguration.getStorageAccountKey(AbfsConfiguration.java:385) 在shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.initializeClient(AzureBlobFileSystemStore.java:802) 在shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore。(AzureBlobFileSystemStore.java:133) 在shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:103)中 在org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
答案 0 :(得分:0)
没关系,愚蠢的错误。应该是:
val sc = SparkContext.getOrCreate()
val spark = SparkSession.builder()。getOrCreate()
sc.getConf.setAppName(“ Test”)
spark.conf.set(“ fs.azure.account.key.xxxx.dfs.core.windows.net”, “”)
spark.conf.set(“ fs.azure.account.auth.type”,“ OAuth”)
spark.conf.set(“ fs.azure.account.oauth.provider.type”, “ org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”)
spark.conf.set(“ fs.azure.account.oauth2.client.id”,“”)
spark.conf.set(“ fs.azure.account.oauth2.client.secret”,“”)
spark.conf.set(“ fs.azure.account.oauth2.client.endpoint”, “ https://login.microsoftonline.com/ / oauth2 / token”)
spark.conf.set (“ fs.azure.createRemoteFileSystemDuringInitialization”,“ false”)`