将Kafka消息转换为数据帧时,将软件包作为参数传递时出错。
from pyspark.sql import SparkSession, Row
from pyspark.context import SparkContext
from kafka import KafkaConsumer
import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-sql-kafka-0-10_2.11-2.0.2.jar,spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar pyspark-shell'
sc = SparkContext.getOrCreate()
spark = SparkSession(sc)
df = spark \
.read \
.format("kafka") \
.option("kafka.bootstrap.servers", "localhost:9092") \
.option("subscribe", "Jim_Topic") \
.load()
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
py4j.protocol.Py4JJavaError:调用o28.load时发生错误。 :java.util.ServiceConfigurationError:org.apache.spark.sql.sources.DataSourceRegister:提供者org.apache.spark.sql.kafka010.KafkaSourceProvider无法实例化
答案 0 :(得分:1)
之所以发生这种情况,是因为interface CustomComponentProps {
Component: // What should I put here?
}
const CustomComponent = ({ Component }: CustomComponentProps) => {
// some other stuff
return <Component someProp={foo} />
}
的版本与您当前正在运行的Spark版本不匹配。
例如,您当前使用的依赖项将适用于Spark 2.4.1:
spark-sql-kafka
要解决此问题,只需在依赖项字符串的末尾使用您的Spark版本(替换org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1
):
x.y.z
答案 1 :(得分:0)
用以下配置定义罐子对我有帮助,
spark = SparkSession.builder\
.appName("Kafka Spark")\
.config("spark.jars", "/C:/Hadoop/Spark/spark-3.0.0-preview2-bin- hadoop2.7/jars/spark-sql-kafka-0-10_2.12-3.0.0-preview2.jar")\
.config("spark.executor.extraClassPath", "/C:/Hadoop/Spark/spark-3.0.0-preview2-bin-hadoop2.7/jars/spark-sql-kafka-0-10_2.12-3.0.0-preview2.jar")\
.config("spark.executor.extraLibrary", "/C:/Hadoop/Spark/spark-3.0.0-preview2-bin-hadoop2.7/jars/spark-sql-kafka-0-10_2.12-3.0.0-preview2.jar")\
.config("spark.driver.extraClassPath", "/C:/Hadoop/Spark/spark-3.0.0-preview2-bin-hadoop2.7/jars/spark-sql-kafka-0-10_2.12-3.0.0-preview2.jar")\
.getOrCreate()