使用Aliyun的预构建SDK的Pyspark似乎不起作用。
Env:
这是我的再现症状的代码段。
命令:
/opt/spark/bin/pyspark --master="mesos://${MASTER}" --executor-memory 12g --jars /home/admin/aliyun-emapreduce-sdk/prebuild/emr-core-1.1.3-SNAPSHOT.jar,/home/admin/aliyun-emapreduce-sdk/prebuild/emr-sdk_2.10-1.1.3-SNAPSHOT.jar --conf "spark.hadoop.fs.oss.impl"="com.aliyun.fs.oss.nat.NativeOssFileSystem"
PySpark代码:
from pyspark import SparkConf
conf = SparkConf()
conf.set("spark.hadoop.fs.oss.impl", "com.aliyun.fs.oss.nat.NativeOssFileSystem")
conf.set("spark.executor.memory", "12g")
conf.set("spark.python.worker.memory", "8g")
from pyspark.sql import SparkSession
spark = SparkSession.builder.config(conf=conf).getOrCreate()
# read from local hdfs
df = spark.read.parquet("hdfs://10.1.185.28:9000/User/admin/nyc/yellow.parquet")
# [failed] write to Aliyun OSS
outPathBase = "oss://MyOSSID:MySecretKey@oss-ap-northeast-1-internal.aliyuncs.com/test"
df.write.parquet(outPathBase+"/yellow.parquet")
这里是这样的错误。
Py4JJavaError: An error occurred while calling o74.parquet.
: java.lang.NoClassDefFoundError: com/aliyun/oss/ServiceException
at com.aliyun.fs.oss.nat.JetOssNativeFileSystemStore.initialize(JetOssNativeFileSystemStore.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
(snip)
Caused by: java.lang.ClassNotFoundException: com.aliyun.oss.ServiceException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
有什么建议吗?