我正在尝试在s3文件上创建JAVARDD但无法创建rdd.Can有人帮我解决了这个问题。
代码:
SparkConf conf = new SparkConf().setAppName(appName).setMaster("local");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
javaSparkContext.hadoopConfiguration().set("fs.s3.awsAccessKeyId",
accessKey);
javaSparkContext.hadoopConfiguration().set("fs.s3.awsSecretAccessKey",
secretKey);
javaSparkContext.hadoopConfiguration().set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
JavaRDD<String> rawData = sparkContext
.textFile("s3://mybucket/sample.txt");
此代码抛出异常
2015-05-06 18:58:57 WARN LoadSnappy:46 - Snappy native library not loaded
java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 3: s3:
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:50)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1084)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1023)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:987)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:177)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD.take(RDD.scala:1156)
at org.apache.spark.rdd.RDD.first(RDD.scala:1189)
at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:477)
at org.apache.spark.api.java.JavaRDD.first(JavaRDD.scala:32)
at com.cignifi.DataExplorationValidation.processFile(DataExplorationValidation.java:148)
at com.cignifi.DataExplorationValidation.main(DataExplorationValidation.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 3: s3:
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.failExpecting(URI.java:2835)
at java.net.URI$Parser.parse(URI.java:3038)
at java.net.URI.<init>(URI.java:753)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
... 36 more
更多细节
Spark版本1.3.0。
使用spark-submit以本地模式运行。
我在本地和EC2实例上尝试了这个东西,在这两种情况下我都遇到了同样的错误。