spark-submit命令包括mysql连接器

时间:2016-12-14 10:25:41

标签: scala apache-spark jdbc apache-spark-sql

我有一个scala对象文件,它在内部查询mysql表做连接并将数据写入s3,在本地测试我的代码它运行得很好。但是当我将它提交到群集时,它会抛出以下错误:

  

线程中的异常" main" java.sql.SQLException:没有合适的驱动程序     在java.sql.DriverManager.getDriver(DriverManager.java:315)at   org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 2.适用(JdbcUtils.scala:54)     在   org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $$ anonfun $ 2.适用(JdbcUtils.scala:54)     在scala.Option.getOrElse(Option.scala:121)at   org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils $ .createConnectionFactory(JdbcUtils.scala:53)     在   org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD $ .resolveTable(JDBCRDD.scala:123)     在   。org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation(JDBCRelation.scala:117)     在   org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:53)     在   org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)     在   org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)     在   org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:122)     在QuaterlyAudit $ .main(QuaterlyAudit.scala:51)at   QuaterlyAudit.main(QuaterlyAudit.scala)at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:736)     在   org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:185)     在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:210)     在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:124)     在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

下面是我的sparksubmit命令:

nohup spark-submit --class QuaterlyAudit --master yarn-client --num-executors 8 
--driver-memory 16g --executor-memory 20g --executor-cores 10 /mypath/campaign.jar &

我正在使用sbt,我在sbt程序集中包含mysql连接器,下面是我的build.sbt文件:

name := "mobilewalla"

version := "1.0"

scalaVersion := "2.11.8"

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.0.0" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.0.0" % "provided",
  "org.apache.hadoop" % "hadoop-aws" % "2.6.0" intransitive(),
  "mysql" % "mysql-connector-java" % "5.1.37")

assemblyMergeStrategy in assembly := {
  case PathList("META-INF", xs@_*) =>
    xs.map(_.toLowerCase) match {
      case ("manifest.mf" :: Nil) |
       ("index.list" :: Nil) |
       ("dependencies" :: Nil) |
       ("license" :: Nil) |
       ("notice" :: Nil) => MergeStrategy.discard
  case _ => MergeStrategy.first // was 'discard' previousely
}
  case "reference.conf" => MergeStrategy.concat
  case _ => MergeStrategy.first
}
assemblyJarName in assembly := "campaign.jar"

我也尝试过:

nohup spark-submit --driver-class-path /mypath/mysql-connector-java-5.1.37.jar 
--class QuaterlyAudit --master yarn-client --num-executors 8 --driver-memory   16g 
--executor-memory 20g --executor-cores 10 /mypath/campaign.jar &

但仍然没有运气,我在这里缺少什么。

2 个答案:

答案 0 :(得分:0)

很明显,Spark无法获得JDBC JAR。可以修复的工作很少。毫无疑问,很多人都面临这个问题。这是因为Jar没有上传到驱动程序和执行程序。

  1. 您可能希望使用构建管理器(Maven,SBT)组装应用程序,因此您不需要在spark-submit cli中添加依赖项。
  2. 您可以在spark-submit cli中使用以下选项: --jars $(echo ./lib/*.jar | tr ' ' ',')
  3. 您还可以尝试在SPARK_HOME / conf / spark-default.conf文件中配置这两个变量:spark.driver.extraClassPathspark.executor.extraClassPath,并将这些变量的值指定为jar文件的路径。确保工作节点上存在相同的路径。

答案 1 :(得分:0)

您必须指定这样的软件包:

spark-submit --packages org.apache.spark:spark-avro_2.11:2.4.4,mysql:mysql-connector-java:5.1.6 your-jar.jar