java.lang.ClassNotFoundException:找不到数据源:json

时间:2016-11-18 11:01:20

标签: scala apache-spark

我想在我们的新集群上部署我的第一个Spark应用程序,但是我得到以下stacktrace:

string[][] strings = BasedOn
    .Split(new char[] { '+' }, StringSplitOptions.RemoveEmptyEntries)
    .Select(w => w.Split('*').ToArray())
    .ToArray();

我为Spark代码编写了一些测试,但它们运行成功。我的pom.xml:

java.lang.ClassNotFoundException: Failed to find data source: json. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:294)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:249)
at de.wlw.mad.company_analyser.util.WebPageReader$.apply(WebPageReader.scala:9)
at de.wlw.mad.company_analyser.Analyse$.sparkCommand(Analyse.scala:54)
at de.wlw.mad.company_analyser.Analyse$.apply(Analyse.scala:171)
at de.wlw.mad.company_analyser.AnalysisWorker$.listen(AnalysisWorker.scala:18)
at de.wlw.mad.bunny.AbstractListener$1.handleDelivery(AbstractListener.java:46)
at com.rabbitmq.client.impl.ConsumerDispatcher$5.run(ConsumerDispatcher.java:144)
at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:99)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: json.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)

相应的代码如下所示:

    <!-- Hadoop -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.7.3</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-hdfs</artifactId>
        <version>2.7.3</version>
    </dependency>

    <!-- Scala Lib -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.8</version>
    </dependency>

    <!-- Spark Dependencies -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-mllib_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.0.2</version>
    </dependency>
    <dependency>
        <artifactId>guava</artifactId>
        <groupId>com.google.guava</groupId>
        <type>jar</type>
        <version>16.0.1</version>
    </dependency>

    <!-- JUnit -->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
        <scope>test</scope>
    </dependency>

WebPageCase是一个Scala案例类

群集在云中运行。使用bin / spark-submit,我得到的错误与从本地环境运行作业相同。 Spark Master和Workers是在没有Hadoop的情况下构建的。

我错过了什么包或做错了什么?

0 个答案:

没有答案