KafkaUtils.createRDD可以在pyspark shell中运行,但是在提交应用程序时失败

时间:2019-08-07 11:05:47

标签: python apache-spark pyspark spark-streaming

我可以在pyspark shell中的代码下运行,但是通过hide rest api提交应用程序时,会发生错误

我通过命令启动pyspark shell:

export PYSPARK_PYTHON=/home/xuananh/data/repo/demo/.venv/python3/bin/python
pyspark --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0

然后我通过命令通过spark hide rest api提交了我的spark应用程序:


curl -X POST http://localhost:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
   "action":"CreateSubmissionRequest",
   "appArgs":[
      "/home/xuananh/data/repo/demo/spark_app/demo.py"
   ],
   "appResource":"/home/xuananh/data/repo/demo/spark_app/demo.py",
   "clientSparkVersion":"2.2.0",
   "environmentVariables":{
      "SPARK_ENV_LOADED":"1",
      "PYSPARK_PYTHON": "/home/xuananh/data/repo/demo/.venv/python3/bin/python"
   },
   "mainClass":"org.apache.spark.deploy.SparkSubmit",
   "sparkProperties":{
      "spark.driver.supervise":"false",
      "spark.app.name":"Simple App",
      "spark.eventLog.enabled":"true",
      "spark.submit.deployMode":"cluster",
      "spark.master":"spark://localhost:6066",
      "spark.driver.extraClassPath" : "/home/xuananh/data/Downloads/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar",
      "spark.executor.extraClassPath" : "/home/xuananh/data/Downloads/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar"
   }
}' 

下面是代码:

topic_name = 'test'
kafka_partition = 0
kafka_from_offset = 0
kafka_last_offset = 100

effective_offset_list.append(OffsetRange(
    topic=topic_name,
    partition=kafka_partition,
    fromOffset=kafka_from_offset,
    untilOffset=kafka_last_offset
))
kafka_params = {
    "zookeeper.connect": "localhost:2181",
    "metadata.broker.list": "localhost:9092",
    "zookeeper.connection.timeout.ms": "10000"
}
spark_context = SparkContext.getOrCreate()
rdd = KafkaUtils.createRDD(spark_context, kafka_params, effective_offset_list)

这是错误消息:

Traceback (most recent call last):
  File "/home/xuananh/data/repo/demo/spark_app/demo.py", line 219, in <module>
    rdd = KafkaUtils.createRDD(spark_context, kafka_params, effective_offset_list)
  File "/opt/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 177, in createRDD
    joffsetRanges = [o._jOffsetRange(helper) for o in offsetRanges]
  File "/opt/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 177, in <listcomp>
    joffsetRanges = [o._jOffsetRange(helper) for o in offsetRanges]
  File "/opt/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 260, in _jOffsetRange
    self.untilOffset)
  File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 323, in get_return_value
    format(target_id, ".", name, value))
py4j.protocol.Py4JError: An error occurred while calling o22.createOffsetRange. Trace:
py4j.Py4JException: Method createOffsetRange([class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:272)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)

有人知道此错误,请帮忙!

0 个答案:

没有答案