PySpark Kafka错误:缺少应用程序资源

时间:2020-06-12 16:19:00

标签: apache-spark pyspark apache-kafka

当我将以下依赖项添加到代码中时,触发以下错误

'--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'

下面是代码,

from pyspark.sql import SparkSession, Row
from pyspark.context import SparkContext
from kafka import KafkaConsumer
import os

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1'


sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark \
  .read \
  .format("kafka") \
  .option("kafka.bootstrap.servers", "localhost:9092") \
  .option("subscribe", "Jim_Topic") \
  .load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")

下面是错误

错误:缺少应用程序资源。

用法:spark-submit [选项] [app自变量] 用法:spark-submit --kill [提交ID] --master [spark:// ...] 用法:spark-submit --status [提交ID] --master [spark:// ...] 用法:spark-submit run-example [options] example-class [example args]

1 个答案:

答案 0 :(得分:1)

您还需要提供python文件的名称。

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.1.1 your_python_file.py'

或者,更好的方法是:

conf = SparkConf().set("spark.jars", "/path/to/your/jar")
sc = SparkContext(conf=conf)