Spark Java:java.lang.NoClassDefFoundError

时间:2015-07-09 17:55:51

标签: java json maven apache-spark java-8

我在本地使用Spark独立,我使用Maven作为构建自动化工具。所以我为spark和简单的JSON设置了所有必需的依赖项。我运行了我的Spark应用程序,用于简单的应用程序,如字数,但是当我从Simple JSON api导入JSONParser时,我得到Class not found异常。我曾尝试使用sparkconfig和spark context添加jar文件,但它仍然无法帮助我。

以下是我的pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>org</groupId>
<artifactId>sparketl</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>sparketl</name>
<url>http://maven.apache.org</url>

<properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.10</artifactId>
        <version>1.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>2.2.0</version>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>com.googlecode.json-simple</groupId>
        <artifactId>json-simple</artifactId>
        <version>1.1.1</version>
    </dependency>


</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.1</version>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>

我的司机班是:

package org.sparketl.etljobs;

import java.util.Arrays;

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

/**
 * @author vijith.reddy
 *
 */
public final class SparkEtl {
    public static void main(String[] args) throws Exception {
        if (args.length < 3) {
            System.err
                .println("Please use: SparkEtl <master> <input file> <output file>");
        System.exit(1);
    }

    @SuppressWarnings("resource")
    JavaSparkContext spark = new JavaSparkContext(args[0],
            "Json ", System.getenv("SPARK_HOME"),
            JavaSparkContext.jarOfClass(SparkEtl.class));
    //SparkConf sc=new SparkConf();
    //sc.setJars(new String[]{"/Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar"});
    spark.addJar("/Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar");
    JavaRDD<String> file = spark.textFile(args[1]);

    FlatMapFunction<String, String> jsonLine = jsonFile -> {
        return Arrays.asList(jsonFile.toLowerCase().split("\\r?\\n"));
    };

    JavaRDD<String> eachLine = file.flatMap(jsonLine);

    PairFunction<String, String, String> mapCountry = eachItem -> {
        JSONParser parser = new JSONParser();
        String country = "";
        try {
            Object obj = parser.parse(eachItem);
            JSONObject jsonObj = (JSONObject) obj;
            country = (String) jsonObj.get("country");
        } catch (Exception e) {
            e.printStackTrace();
        }
        return new Tuple2<String, String>(eachItem, country);
    };


    JavaPairRDD<String, String> pairs = eachLine.mapToPair(mapCountry);

    pairs.sortByKey(true).saveAsTextFile(args[2]);
    System.exit(0);

}

}

我的日志中出现以下错误:

    15/07/08 16:09:17 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/07/08 16:09:17 INFO SparkContext: Added JAR /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar at http://172.16.8.157:52255/jars/json-simple-1.1.1-sources.jar with timestamp 1436396957111
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(110248) called with curMem=0, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 107.7 KB, free 265.0 MB)
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(10090) called with curMem=110248, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 9.9 KB, free 265.0 MB)
15/07/08 16:09:17 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.8.157:52257 (size: 9.9 KB, free: 265.1 MB)
15/07/08 16:09:17 INFO SparkContext: Created broadcast 0 from textFile at SparkEtl.java:35
15/07/08 16:09:17 INFO FileInputFormat: Total input paths to process : 1
15/07/08 16:09:17 INFO SparkContext: Starting job: sortByKey at SparkEtl.java:58
15/07/08 16:09:17 INFO DAGScheduler: Got job 0 (sortByKey at SparkEtl.java:58) with 2 output partitions (allowLocal=false)
15/07/08 16:09:17 INFO DAGScheduler: Final stage: ResultStage 0(sortByKey at SparkEtl.java:58)
15/07/08 16:09:17 INFO DAGScheduler: Parents of final stage: List()
15/07/08 16:09:17 INFO DAGScheduler: Missing parents: List()
15/07/08 16:09:17 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at sortByKey at SparkEtl.java:58), which has no missing parents
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(5248) called with curMem=120338, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.1 KB, free 265.0 MB)
15/07/08 16:09:17 INFO MemoryStore: ensureFreeSpace(2888) called with curMem=125586, maxMem=278019440
15/07/08 16:09:17 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.8 KB, free 265.0 MB)
15/07/08 16:09:17 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.8.157:52257 (size: 2.8 KB, free: 265.1 MB)
15/07/08 16:09:17 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/07/08 16:09:17 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at sortByKey at SparkEtl.java:58)
15/07/08 16:09:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/07/08 16:09:18 INFO SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.16.8.157:52260/user/Executor#2100827222]) with ID 0
15/07/08 16:09:18 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.16.8.157, PROCESS_LOCAL, 1560 bytes)
15/07/08 16:09:18 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 172.16.8.157, PROCESS_LOCAL, 1560 bytes)
15/07/08 16:09:18 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.8.157:52263 with 265.1 MB RAM, BlockManagerId(0, 172.16.8.157, 52263)
15/07/08 16:09:18 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.8.157:52263 (size: 2.8 KB, free: 265.1 MB)
15/07/08 16:09:18 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.8.157:52263 (size: 9.9 KB, free: 265.1 MB)
15/07/08 16:09:19 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 172.16.8.157): java.lang.NoClassDefFoundError: org/json/simple/parser/JSONParser
    at org.sparketl.etljobs.SparkEtl.lambda$main$b9f570ea$1(SparkEtl.java:44)
    at org.sparketl.etljobs.SparkEtl$$Lambda$11/1498038525.call(Unknown Source)
    at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1030)
    at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1030)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:42)
    at org.apache.spark.RangePartitioner$$anonfun$8.apply(Partitioner.scala:259)
    at org.apache.spark.RangePartitioner$$anonfun$8.apply(Partitioner.scala:257)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$18.apply(RDD.scala:703)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

15/07/08 16:09:19 INFO TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) on executor 172.16.8.157: java.lang.NoClassDefFoundError (org/json/simple/parser/JSONParser) [duplicate 1]

我的Spark配置

spark.executor.memory   512m
spark.driver.cores      1
spark.driver.memory     512m
spark.driver.extraClassPath   /Users/username/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1-sources.jar

有没有人遇到过这个问题?如果是这样,那么解决方案是什么呢?

1 个答案:

答案 0 :(得分:3)

根据spark.driver.extraClassPath(和代码库) - 提供给Spark的库是一个源库(json-simple-1.1.1-sources.jar)。该库可能只包含java文件(源文件,而不是编译的java类)。

将其更改为json-simple-1.1.1.jar(当然是完整路径)应该有所帮助。