我有一个scala对象文件,它在内部查询mysql表做连接并将数据写入s3,在本地测试我的代码它运行得很好。但是当我将它提交到群集时,它会抛出以下错误:
线程中的异常" main" java.sql.SQLException:没有合适的驱动程序 在java.sql.DriverManager.getDriver(DriverManager.java:315)at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 2.适用(JdbcUtils.scala:54) 在 org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 2.适用(JdbcUtils.scala:54) 在scala.Option.getOrElse(Option.scala:121)at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $ .createConnectionFactory(JdbcUtils.scala:53) 在 org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD $ .resolveTable(JDBCRDD.scala:123) 在 。org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation(JDBCRelation.scala:117) 在 org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53) 在 org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) 在 org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122) 在QuaterlyAudit $ .main(QuaterlyAudit.scala:51)at QuaterlyAudit.main(QuaterlyAudit.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:498)at org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:736) 在 org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:185) 在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:210) 在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:124) 在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
下面是我的sparksubmit命令:
nohup spark-submit --class QuaterlyAudit --master yarn-client --num-executors 8
--driver-memory 16g --executor-memory 20g --executor-cores 10 /mypath/campaign.jar &
我正在使用sbt,我在sbt程序集中包含mysql连接器,下面是我的build.sbt文件:
name := "mobilewalla"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
"org.apache.spark" %% "spark-sql" % "2.0.0" % "provided",
"org.apache.hadoop" % "hadoop-aws" % "2.6.0" intransitive(),
"mysql" % "mysql-connector-java" % "5.1.37")
assemblyMergeStrategy in assembly := {
case PathList("META-INF", xs@_*) =>
xs.map(_.toLowerCase) match {
case ("manifest.mf" :: Nil) |
("index.list" :: Nil) |
("dependencies" :: Nil) |
("license" :: Nil) |
("notice" :: Nil) => MergeStrategy.discard
case _ => MergeStrategy.first // was 'discard' previousely
}
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
assemblyJarName in assembly := "campaign.jar"
我也尝试过:
nohup spark-submit --driver-class-path /mypath/mysql-connector-java-5.1.37.jar
--class QuaterlyAudit --master yarn-client --num-executors 8 --driver-memory 16g
--executor-memory 20g --executor-cores 10 /mypath/campaign.jar &
但仍然没有运气,我在这里缺少什么。
答案 0 :(得分:0)
很明显,Spark无法获得JDBC JAR。可以修复的工作很少。毫无疑问,很多人都面临这个问题。这是因为Jar没有上传到驱动程序和执行程序。
spark-submit
cli中添加依赖项。spark-submit
cli中使用以下选项:
--jars $(echo ./lib/*.jar | tr ' ' ',')
spark.driver.extraClassPath
和spark.executor.extraClassPath
,并将这些变量的值指定为jar文件的路径。确保工作节点上存在相同的路径。答案 1 :(得分:0)
您必须指定这样的软件包:
spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4,mysql:mysql-connector-java:5.1.6 your-jar.jar