Spark:java.lang.IllegalArgumentException:URI s3中的无效主机名:/// <bucket-name>

时间:2015-08-09 16:28:38

标签: scala amazon-s3 apache-spark emr

我在Scala中编写了一个示例Spark程序来计算Amazon S3中存在的文本文件的行数。以下是我的示例程序。

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import java.util.{Map => JMap}
import org.apache.hadoop.conf.Configuration
object CountLines {
def main(args: Array[String]) {
val sc = new SparkContext(new SparkConf().setAppName("CountLines").setMaster("local"))
sc.hadoopConfiguration.set("fs.s3.awsAccessKeyId","ABC");
sc.hadoopConfiguration.set("fs.s3.awsSecretAccessKey","XYZ");
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId","ABC");
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey","XYX");
sc.hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
val path ="s3:///my-bucket/test/test.txt";
println("num lines: " + countLines(sc, path));
}
def countLines(sc: SparkContext, path: String): Long = {
sc.textFile(path).count();
}
}

不幸的是,我收到的IllegalArgumentException与凭据有关。下面是堆栈跟踪。

Exception in thread "main" java.lang.IllegalArgumentException: Invalid hostname in URI s3:/my-bucket/test/test.txt
        at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:45)
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:76)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)

我已经提供了有效的证件。我将其打包为JAR文件,并使用spark-submit命令在集群上运行。我不确定这是否是在spark中设置访问密钥和密钥的正确方法。我尝试了不同的方法但似乎没有任何效果。对此问题有所了解将受到高度赞赏。

谢谢, J Joseph

1 个答案:

答案 0 :(得分:0)

你有一个额外的斜线。您必须将s3:///my-bucket/test/test.txt更改为s3://my-bucket/test/test.txt