在纱线上运行Spark时没有找到任何课程

时间:2016-12-22 17:20:55

标签: apache-spark

可以在Spark独立版上运行相同的代码,但是当我在Yarn上运行spark时,它在Yarn上失败了。例外情况是:java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.common.xcontent.json.JsonXContent被扔进Executor(Yarn Container)。但是当我使用maven组件时,我确实在应用程序组件jar中包含了elasticSearch jar。运行命令如下:

spark-submit --executor-memory 10g --executor-cores 2 --num-executors 2 
--queue thejob --master yarn --class com.batch.TestBat /lib/batapp-mr.jar 2016-12-20

maven依赖关系如下:

<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-hive_2.10</artifactId>
    <version>1.6.0</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-mllib_2.10</artifactId>
    <version>1.6.0</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.10</artifactId>
    <version>1.6.0</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.10</artifactId>
    <version>1.6.0</version>
    <scope>provided</scope>
</dependency>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-catalyst_2.10</artifactId>
    <version>1.6.0</version>
    <scope>provided</scope>
</dependency>

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-core</artifactId>
    <version>2.6.3</version>
    <!-- <scope>provided</scope> -->
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.2.0-cdh5.7.0</version>
    <!--<scope>provided</scope> -->
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.2.0-cdh5.7.0</version>
    <!--<scope>provided</scope> -->
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-protocol</artifactId>
    <version>1.2.0-cdh5.7.0</version>
    <!--<scope>provided</scope> -->
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-hadoop2-compat</artifactId>
    <version>1.2.0-cdh5.7.0</version>
    <!--<scope>provided</scope> -->
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-common</artifactId>
    <version>1.2.0-cdh5.7.0</version>
    <!--<scope>provided</scope> -->
</dependency>
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-hadoop-compat</artifactId>
    <version>1.2.0-cdh5.7.0</version>
    <!--<scope>provided</scope> -->
</dependency>


<dependency>
    <groupId>com.sksamuel.elastic4s</groupId>
    <artifactId>elastic4s-core_2.10</artifactId>
    <version>2.3.0</version>
    <!--<scope>provided</scope> -->
    <exclusions>
        <exclusion>
            <artifactId>elasticsearch</artifactId>
            <groupId>org.elasticsearch</groupId>
        </exclusion>
    </exclusions>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch</artifactId>
    <version>2.3.2</version>
</dependency>
<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-hadoop</artifactId>
    <version>2.3.1</version>
    <exclusions>
        <exclusion>
            <artifactId>log4j-over-slf4j</artifactId>
            <groupId>org.slf4j</groupId>
        </exclusion>
    </exclusions>
</dependency>

奇怪的是,Executor可以找到Hbase jar和ElasticSearch jar,它们都包含在依赖项中,而不是ElasticSearch的一些类,所以我猜可能会有一些类冲突。我检查了装配罐,它确实包含了#34;缺少的类&#34;。

1 个答案:

答案 0 :(得分:2)

我可以看到,你已经included the jar dependency了。 您还注释了依赖项provided意味着它将被打包并且您的部署可以使用相同的内容。

<dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.6.3</version>
        </dependency>

我怀疑/肯定是火花提交请检查如下。

--conf "spark.driver.extraLibrayPath=$HADOOP_HOME/*:$HBASE_HOME/*:$HADOOP_HOME/lib/*:$HBASE_HOME/lib/htrace-core-3.1.0-incubating.jar:$HDFS_PATH/*:$SOLR_HOME/*:$SOLR_HOME/lib/*" \
        --conf "spark.executor.extraLibraryPath=$HADOOP_HOME/*" \
--conf "spark.driver.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')
            --conf "spark.executor.extraClassPath=$(echo /your directory of jars/*.jar | tr ' ' ',')

其中 您的jar目录 是从您的发行版中提取的.b。 您还可以从程序中打印如下所示的Classpath

val cl = ClassLoader.getSystemClassLoader  
 cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(pri‌​ntln)

编辑:执行上面的行之后,如果在类路径中找到旧的重复jar,那么请将您的库包含在您的类中 应用或使用--jars,但也尝试设置 spark.{driver,executor}.userClassPathFirsttrue