AWS EMR无主机:hdfs:/// var / log / spark / apps

时间:2016-03-14 10:44:23

标签: apache-spark amazon-emr

我正在尝试使用AWS EMR(emr-4.3.0)Spark 1.6.0,Hadoop 2.7.0 我创建了EMR集群,并在我的示例jar中添加了Step(在AWS ERM Web中)。 它是SpringBoot应用程序,由Java(1.8)编写(我在框中安装了JDK8)

使用以下命令运行

hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --class org.springframework.boot.loader.JarLauncher s3://my-test/SparkForSpring-S1.2014.jar

我创建了SparkContext,如下面的代码。

    SparkConf conf = new SparkConf().setAppName("SparkForSpring");
    return new JavaSparkContext(conf);

但它失败并出现以下错误,我觉得它与我的应用程序无关,我是Spark,Yarn的新手。

Caused by: org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.apache.spark.api.java.JavaSparkContext com.pivotal.demo.spark.rocket.rdd.SparkConfig.javaSparkContext()] threw exception; nested exception is java.io.IOException: Incomplete HDFS URI, no host: hdfs:///var/log/spark/apps
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:188)
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:586)
    ... 49 more
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///var/log/spark/apps
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:143)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1650)
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:66)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:547)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig.javaSparkContext(SparkConfig.java:35)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$$82429e1b.CGLIB$javaSparkContext$0(<generated>)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$$82429e1b$$FastClassBySpringCGLIB$$10b15a77.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
    at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:312)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$$82429e1b.javaSparkContext(<generated>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:166)
    ... 50 more

我阅读了一些文档,但我不确定如何修复此错误。提示将非常有用。

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html

1 个答案:

答案 0 :(得分:0)

我通过不使用SpringBoot的可执行jar解决了这个问题,而是使用maven shade插件将一个jar中的spring相关jar文件打包并使用系统类加载器。这是完整的pom.xml

我从这个问题的答案中得到了一个暗示 apache-spark 1.3.0 and yarn integration and spring-boot as a container