目前,我们正在测试结构化流媒体Kafka驱动程序。我们使用--packages'org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0'在YARN(2.7.3)上提交,没有问题。但是,当我们尝试使用deploy mode = cluster在spark standalone上启动时,我们得到了
ClassNotFoundException: Failed to find data source: kafka
错误,即使启动命令已将Kafka jar添加到-Dspark.jars(见下文),后续日志进一步说明这些jar已成功添加。
所有10个罐子都存在于所有节点上的/home/spark/.ivy2中。我手动检查KafkaSourceProvider
中是否存在org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar
类。我进一步确认,通过在没有--packages
选项的情况下在YARN中启动驱动程序并使用--jars option
手动添加所有10个罐子,罐子没有问题。
节点运行Scala 2.11.8。
任何见解都表示赞赏。
spark-submit自动添加的罐子:
-Dspark.jars=file:/home/spark/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar,file:/home/spark/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar,file:/home/spark/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.1.0.jar,file:/home/spark/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar,file:/home/spark/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar,file:/home/spark/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar,file:/home/spark/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar,file:/home/spark/.ivy2/jars/org.scalatest_scalatest_2.11-2.2.6.jar,file:/home/spark/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar,file:/home/spark/.ivy2/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar
Spark信息消息似乎已加载这些jar:
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar at spark://10.102.22.23:50513/jars/org.apache.spark_spark-sql-kafka-0-10_2.11-2.1.0.jar with timestamp 1485467844922
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar at spark://10.102.22.23:50513/jars/org.apache.kafka_kafka-clients-0.10.0.1.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.1.0.jar at spark://10.102.22.23:50513/jars/org.apache.spark_spark-tags_2.11-2.1.0.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar at spark://10.102.22.23:50513/jars/org.spark-project.spark_unused-1.0.0.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar at spark://10.102.22.23:50513/jars/net.jpountz.lz4_lz4-1.3.0.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar at spark://10.102.22.23:50513/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar at spark://10.102.22.23:50513/jars/org.slf4j_slf4j-api-1.7.16.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.scalatest_scalatest_2.11-2.2.6.jar at spark://10.102.22.23:50513/jars/org.scalatest_scalatest_2.11-2.2.6.jar with timestamp 1485467844923
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.scala-lang_scala-reflect-2.11.8.jar at spark://10.102.22.23:50513/jars/org.scala-lang_scala-reflect-2.11.8.jar with timestamp 1485467844924
17/01/26 21:57:24 INFO SparkContext: Added JAR file:/home/spark/.ivy2/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar at spark://10.102.22.23:50513/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar with timestamp 1485467844924
错误消息:
Caused by: java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:569)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:197)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)
at com.dematic.labs.analytics.diagnostics.spark.drivers.StructuredStreamingSignalCount$.main(StructuredStreamingSignalCount.scala:76)
at com.dematic.labs.analytics.diagnostics.spark.drivers.StructuredStreamingSignalCount.main(StructuredStreamingSignalCount.scala)
... 6 more
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource