我可以在pyspark shell中的代码下运行,但是通过hide rest api提交应用程序时,会发生错误
我通过命令启动pyspark shell:
export PYSPARK_PYTHON=/home/xuananh/data/repo/demo/.venv/python3/bin/python
pyspark --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0
然后我通过命令通过spark hide rest api提交了我的spark应用程序:
curl -X POST http://localhost:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action":"CreateSubmissionRequest",
"appArgs":[
"/home/xuananh/data/repo/demo/spark_app/demo.py"
],
"appResource":"/home/xuananh/data/repo/demo/spark_app/demo.py",
"clientSparkVersion":"2.2.0",
"environmentVariables":{
"SPARK_ENV_LOADED":"1",
"PYSPARK_PYTHON": "/home/xuananh/data/repo/demo/.venv/python3/bin/python"
},
"mainClass":"org.apache.spark.deploy.SparkSubmit",
"sparkProperties":{
"spark.driver.supervise":"false",
"spark.app.name":"Simple App",
"spark.eventLog.enabled":"true",
"spark.submit.deployMode":"cluster",
"spark.master":"spark://localhost:6066",
"spark.driver.extraClassPath" : "/home/xuananh/data/Downloads/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar",
"spark.executor.extraClassPath" : "/home/xuananh/data/Downloads/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar"
}
}'
下面是代码:
topic_name = 'test'
kafka_partition = 0
kafka_from_offset = 0
kafka_last_offset = 100
effective_offset_list.append(OffsetRange(
topic=topic_name,
partition=kafka_partition,
fromOffset=kafka_from_offset,
untilOffset=kafka_last_offset
))
kafka_params = {
"zookeeper.connect": "localhost:2181",
"metadata.broker.list": "localhost:9092",
"zookeeper.connection.timeout.ms": "10000"
}
spark_context = SparkContext.getOrCreate()
rdd = KafkaUtils.createRDD(spark_context, kafka_params, effective_offset_list)
这是错误消息:
Traceback (most recent call last):
File "/home/xuananh/data/repo/demo/spark_app/demo.py", line 219, in <module>
rdd = KafkaUtils.createRDD(spark_context, kafka_params, effective_offset_list)
File "/opt/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 177, in createRDD
joffsetRanges = [o._jOffsetRange(helper) for o in offsetRanges]
File "/opt/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 177, in <listcomp>
joffsetRanges = [o._jOffsetRange(helper) for o in offsetRanges]
File "/opt/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 260, in _jOffsetRange
self.untilOffset)
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
format(target_id, ".", name, value))
py4j.protocol.Py4JError: An error occurred while calling o22.createOffsetRange. Trace:
py4j.Py4JException: Method createOffsetRange([class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:272)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
有人知道此错误,请帮忙!