我目前正在将Azure Blob与Pyspark连接起来,并且在使两者连接并运行时遇到困难。我已经安装了两个必需的jar文件(hadoop-azure-3.2.0-javadoc.jar和azure-storage-8.3.0-javadoc.jar)。我设置为使用sparkConf
在SparkConf().setAll()
中读取它们,一旦我开始会话,我就使用:
spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")
sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")
但它总是返回
java.io.IOException:方案:wasbs没有文件系统
有什么想法吗?
我已经遵循以下要求:
https://github.com/Azure/mmlspark/issues/456
PySpark java.io.IOException: No FileSystem for scheme: https
spark-shell error : No FileSystem for scheme: wasb
import findspark
findspark.init('dir/spark/spark-2.4.0-bin-hadoop2.7')
from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql import SQLContext
conf = SparkConf().setAll([(u'spark.submit.pyFiles', u'/dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar,/dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,/dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,/dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,/dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,/dir/.ivy2/jars/joda-time_joda-time-2.3.jar,/dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,/dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,/dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.app.id', u'local-1553969107475'), (u'spark.driver.port', u'38809'), (u'spark.executor.id', u'driver'), (u'spark.app.name', u'PySparkShell'), (u'spark.driver.host', u'test-VM'), (u'spark.sql.catalogImplementation', u'hive'), (u'spark.rdd.compress', u'True'),(u'spark.serializer.objectStreamReset', u'100'), (u'spark.master', u'local[*]'), (u'spark.submit.deployMode', u'client'), (u'spark.repl.local.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.files', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar,file:///dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,file:///dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar'), (u'spark.ui.showConsoleProgress', u'true')])
sc = SparkContext(conf=conf)
spark = SparkSession(sc)
spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")
sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")
返回
java.io.IOException:方案:wasbs没有文件系统