我最近升级到Spark 2.3.0。我有一个现有的spark作业,该作业以前可以在spark 2.2.0上运行。 我正面临AbstractMethodError的Java异常
我的简单代码:
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
if __name__ == "__main__":
print "Here it is!"
sc = SparkContext(appName="Tester")
ssc = StreamingContext(sc, 1)
这在Spark 2.2.0上正常工作
使用Spark spark 2.3.0时,出现以下异常:
ssc = StreamingContext(sc, 1)
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/streaming/context.py", line 61, in __init__
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/streaming/context.py", line 65, in _initialize_context
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/java_gateway.py", line 1428, in __call__
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.6-src.zip/py4j/protocol.py", line 320, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.streaming.api.java.JavaStreamingContext.
: java.lang.AbstractMethodError
at org.apache.spark.util.ListenerBus$class.$init$(ListenerBus.scala:35)
at org.apache.spark.streaming.scheduler.StreamingListenerBus.<init>(StreamingListenerBus.scala:30)
at org.apache.spark.streaming.scheduler.JobScheduler.<init>(JobScheduler.scala:57)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:184)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:76)
at org.apache.spark.streaming.api.java.JavaStreamingContext.<init>(JavaStreamingContext.scala:130)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
我正在将spark-streaming-kafka-0-8_2.11-2.3.0.jar
用于带有-packages选项的spark-submit命令。
我尝试将spark-streaming-kafka-0-8-assembly_2.11-2.3.0.jar
与--package和--jars选项一起使用。
Python version: 2.7.5
我在这里遵循了指南:https://spark.apache.org/docs/2.3.0/streaming-kafka-0-8-integration.html
spark stream kafka版本0-8在2.3.0中已弃用,但根据文档仍然存在。
我的命令如下:
spark-submit --master spark://10.183.0.41:7077 --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.0 Kafka_test.py
可以肯定的是,Saprk的scala底层代码有所更改。
有人遇到过同样的问题吗?
答案 0 :(得分:0)
https://spark.apache.org/docs/2.3.0/streaming-kafka-integration.html
kapka 0.8的支持自spark2.3.0起已弃用