在我的代码中执行以下命令:
kafka_streams = [KafkaUtils.createStream(ssc, zk_settings['QUORUM'], zk_settings['CONSUMERS'][k],
{zk_settings['TOPICS'][0]: zk_settings['NUM_THREADS']})
.window(zk_settings['WINDOW_DURATION'], zk_settings['SLIDE_DURATION'])
for k in range(len(zk_settings['CONSUMERS']))]
但是我收到以下错误:
Exception in thread "Thread-3" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2625)
at java.lang.Class.privateGetPublicMethods(Class.java:2743)
at java.lang.Class.getMethods(Class.java:1480)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:365)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:317)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
at py4j.Gateway.invoke(Gateway.java:252)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
我错过了什么吗?
我遇到了一些火花错误,所以我重建了火花错误,导致了这个错误。
答案 0 :(得分:2)
您提交代码时应添加--packages
。
./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 <DIR>/main.py localhost:9092 test
https://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html
答案 1 :(得分:1)
我也有这个问题的原因是我下载的spark-streaming-kafka jar不是 assembly jar。我通过以下操作解决了这个问题:
首先使用
下载程序集spark-streaming-kafka...jar
wget https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11/2.2.0/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar
对我来说,我使用的是spark-2.2.0,因此请尝试访问URL https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-8-assembly_2.11,以查看需要下载的相应jar。
然后获取
spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /path/to/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar myApp.py
答案 2 :(得分:0)
classNotFoundException
表示您的程序 spark-submit 正在运行
它无法在程序的运行目录中找到所需的类kafka.common.TopicAndPartition
。
看一下 spark-submit 命令的用法:
# spark-submit --help
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.
--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or
on one of the worker machines inside the cluster ("cluster")
(Default: client).
--class CLASS_NAME Your application's main class (for Java / Scala apps).
--name NAME A name of your application.
--jars JARS Comma-separated list of local jars to include on the driver
and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
使用kafka的本地jar路径添加--jars选项,如下所示:
# spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 --jars /path/to/org.apache.kafka_kafka_2.11-0.8.2.1.jar,/path/to/com.yammer.metrics_metrics-core-2.2.0.jar your_python_script.py
。