将pyspark与WASB结合使用/将Pyspark与Azure Blob连接时出错

时间:2019-07-17 18:58:42

标签: azure pyspark

我目前正在将Azure Blob与Pyspark连接起来,并且在使两者连接并运行时遇到困难。我已经安装了两个必需的jar文件(hadoop-azure-3.2.0-javadoc.jar和azure-storage-8.3.0-javadoc.jar)。我设置为使用sparkConfSparkConf().setAll()中读取它们,一旦我开始会话,我就使用:

spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")

sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")

但它总是返回

  

java.io.IOException:方案:wasbs没有文件系统

有什么想法吗?

我已经遵循以下要求:

https://github.com/Azure/mmlspark/issues/456

PySpark java.io.IOException: No FileSystem for scheme: https

spark-shell error : No FileSystem for scheme: wasb

import findspark

findspark.init('dir/spark/spark-2.4.0-bin-hadoop2.7')

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql import SQLContext

conf = SparkConf().setAll([(u'spark.submit.pyFiles', u'/dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar,/dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,/dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,/dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,/dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,/dir/.ivy2/jars/joda-time_joda-time-2.3.jar,/dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,/dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,/dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.app.id', u'local-1553969107475'), (u'spark.driver.port', u'38809'), (u'spark.executor.id', u'driver'), (u'spark.app.name', u'PySparkShell'), (u'spark.driver.host', u'test-VM'), (u'spark.sql.catalogImplementation', u'hive'), (u'spark.rdd.compress', u'True'),(u'spark.serializer.objectStreamReset', u'100'), (u'spark.master', u'local[*]'), (u'spark.submit.deployMode', u'client'), (u'spark.repl.local.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.files', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar,file:///dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,file:///dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar'), (u'spark.ui.showConsoleProgress', u'true')])

sc = SparkContext(conf=conf)
spark = SparkSession(sc)

spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")

sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")

返回

  

java.io.IOException:方案:wasbs没有文件系统

0 个答案:

没有答案