Pyspark java.lang.NoClassDefFoundError:com / aliyun / oss / ServiceException

时间:2018-07-16 02:54:31

标签: pyspark open-source alibaba-cloud

使用Aliyun的预构建SDK的Pyspark似乎不起作用。

Env:

  • 虚拟机上的Apache Spark 2.2.0
  • 在同一用户帐户下创建的OSS
  • 东京地区(ap-northeast-1)

这是我的再现症状的代码段。

命令:

/opt/spark/bin/pyspark --master="mesos://${MASTER}" --executor-memory 12g --jars /home/admin/aliyun-emapreduce-sdk/prebuild/emr-core-1.1.3-SNAPSHOT.jar,/home/admin/aliyun-emapreduce-sdk/prebuild/emr-sdk_2.10-1.1.3-SNAPSHOT.jar --conf "spark.hadoop.fs.oss.impl"="com.aliyun.fs.oss.nat.NativeOssFileSystem"

PySpark代码:

from pyspark import SparkConf
conf = SparkConf()
conf.set("spark.hadoop.fs.oss.impl", "com.aliyun.fs.oss.nat.NativeOssFileSystem")
conf.set("spark.executor.memory", "12g")
conf.set("spark.python.worker.memory", "8g")
from pyspark.sql import SparkSession
spark = SparkSession.builder.config(conf=conf).getOrCreate()

# read from local hdfs
df = spark.read.parquet("hdfs://10.1.185.28:9000/User/admin/nyc/yellow.parquet")
# [failed] write to Aliyun OSS
outPathBase = "oss://MyOSSID:MySecretKey@oss-ap-northeast-1-internal.aliyuncs.com/test"
df.write.parquet(outPathBase+"/yellow.parquet")

这里是这样的错误。

Py4JJavaError: An error occurred while calling o74.parquet.
: java.lang.NoClassDefFoundError: com/aliyun/oss/ServiceException
    at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.initialize(JetOssNativeFileSystemStore.java:107)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
(snip)
Caused by: java.lang.ClassNotFoundException: com.aliyun.oss.ServiceException
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

有什么建议吗?

0 个答案:

没有答案