java.lang.NoClassDefFoundError:kafka / common / TopicAndPartition

时间:2016-03-10 07:43:53

标签: java apache-spark apache-kafka

在我的代码中执行以下命令:

kafka_streams = [KafkaUtils.createStream(ssc, zk_settings['QUORUM'], zk_settings['CONSUMERS'][k],
                                              {zk_settings['TOPICS'][0]: zk_settings['NUM_THREADS']})
                           .window(zk_settings['WINDOW_DURATION'], zk_settings['SLIDE_DURATION'])
                 for k in range(len(zk_settings['CONSUMERS']))]

但是我收到以下错误:

Exception in thread "Thread-3" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2625)
at java.lang.Class.privateGetPublicMethods(Class.java:2743)
at java.lang.Class.getMethods(Class.java:1480)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:365)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:317)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more

我错过了什么吗?

我遇到了一些火花错误,所以我重建了火花错误,导致了这个错误。

3 个答案:

答案 0 :(得分:2)

您提交代码时应添加--packages

 ./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0  <DIR>/main.py localhost:9092 test
  

https://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html

答案 1 :(得分:1)

我也有这个问题的原因是我下载的spark-streaming-kafka jar不是 assembly jar。我通过以下操作解决了这个问题:

  1. 首先使用

    下载程序集 spark-streaming-kafka...jar
    wget https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11/2.2.0/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar
    

对我来说,我使用的是spark-2.2.0,因此请尝试访问URL https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11,以查看需要下载的相应jar。

  1. 然后获取

    spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /path/to/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar myApp.py
    

答案 2 :(得分:0)

classNotFoundException表示您的程序 spark-submit 正在运行 它无法在程序的运行目录中找到所需的类kafka.common.TopicAndPartition

看一下 spark-submit 命令的用法:

# spark-submit --help
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]

Options:
  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or         local.
  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally     ("client") or
                              on one of the worker machines inside the cluster ("cluster")
                              (Default: client).
  --class CLASS_NAME          Your application's main class (for Java / Scala apps).
  --name NAME                 A name of your application.
  --jars JARS                 Comma-separated list of local jars to include on the driver
                              and executor classpaths.
  --packages                  Comma-separated list of maven coordinates of jars to include
                              on the driver and executor classpaths. Will     search the local
                              maven repo, then maven central and any additional remote
                              repositories given by --repositories. The format for the
                              coordinates should be groupId:artifactId:version.

使用kafka的本地jar路径添加--jars选项,如下所示:

# spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /path/to/org.apache.kafka_kafka_2.11-0.8.2.1.jar,/path/to/com.yammer.metrics_metrics-core-2.2.0.jar your_python_script.py