全部 我正在尝试从安装在本地mac机中的spark连接到S3环境,并使用以下命令
./bin/spark-shell --packages com.amazonaws:aws-java-sdk-pom:1.11.271,org.apache.hadoop:hadoop-aws:3.1.1,org.apache.hadoop:hadoop-hdfs:2.7.1
这将连接到scala并下载所有库
然后我在spark shell中执行以下命令
val accessKeyId = System.getenv("AWS_ACCESS_KEY_ID")
val secretAccessKey = System.getenv("AWS_SECRET_ACCESS_KEY")
val hadoopConf=sc.hadoopConfigurationhadoopConf.set("fs.s3.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", accessKeyId)
hadoopConf.set("fs.s3.awsSecretAccessKey", secretAccessKey)
hadoopConf.set("fs.s3n.awsAccessKeyId", accessKeyId)
hadoopConf.set("fs.s3n.awsSecretAccessKey", secretAccessKey)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val df = sqlContext.read.json("s3a://path/1551467354353.c948f177e1fb.dev.0fd8f5fd-22d4-4523-b6bc-b68c181b4906.gz")
但是当我使用S3a或S3时我得到NoClassDefFoundError: org/apache/hadoop/fs/StreamCapabilities
知道我在这里可能会错过什么吗?