我正在尝试通过以下scala代码连接到Hive服务器。
def getHiveConnection(): Connection = {
println("Building Hive connection..")
val driver = "org.apache.hive.jdbc.HiveDriver"
val user = "user"
val pwd = "pwd
val url = "jdbc:hive2://ip-00-000-000-000.ec2.internal:00000/dbname;principal=hive/ip-00-000-000-000.ec2.internal@DEV.COM"
var connection: Connection = null
val conf = new Configuration()
conf.set("hadoop.security.authentication", "Kerberos")
UserGroupInformation.setConfiguration(conf)
try {
println("Setting the driver..")
Class.forName(driver)
println("pre connection")
if((connection == null) || connection.isClosed()) {
connection = DriverManager.getConnection(url, user, pwd)
println("Hive connection eshtablished.")
}
} catch {
case cnf:ClassNotFoundException => println("Invalid driver used. Check the settings.")
cnf.printStackTrace()
case e:Exception => println("Other exception.")
e.printStackTrace()
}
connection
}
我从IntelliJ上的程序创建了一个jar文件,然后使用spar-submit运行jar,因为我需要运行一些SPARK不支持的sql。
火花提交:
SPARK_MAJOR_VERSION=2 spark-submit --class com.package.program.Begin --master=yarn --conf spark.ui.port=4090 --driver-class-path /home/username/testlib/inputdir/myjars/hive-jdbc-2.3.5.jar --conf spark.jars=/home/username/testlib/inputdir/myjars/hive-jdbc-2.3.5.jar --executor-cores 4 --executor-memory 4G --keytab /home/username/username.keytab --principal username@DEV.COM --files /$SPARK_HOME/conf/hive-site.xml,connection.properties --name Splinter splinter_2.11-0.1.jar
我提交代码时,它失败,但出现以下异常:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/rpc/thrift/TCLIService$Iface
确切地说,以下是例外:
connection = DriverManager.getConnection(url, user, pwd)
我在SBT文件中添加的依赖项可以在下面看到:
name := "Splinter"
version := "0.1"
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.0.0",
"org.apache.spark" %% "spark-sql" % "2.0.0",
"org.json4s" %% "json4s-jackson" % "3.2.11",
"org.apache.httpcomponents" % "httpclient" % "4.5.3",
"org.apache.spark" %% "spark-hive" % "2.0.0",
)
libraryDependencies += "org.postgresql" % "postgresql" % "42.1.4"
libraryDependencies += "org.apache.hadoop" % "hadoop-auth" % "2.6.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.2"
libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "1.2.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-common" % "2.6.5"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.6.5"
libraryDependencies += "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.6.5" % "provided"
libraryDependencies += "org.apache.hive" % "hive-jdbc" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-common" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-metastore" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-service" % "2.3.5"
libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.26"
libraryDependencies += "commons-cli" % "commons-cli" % "1.4"
libraryDependencies += "org.apache.hive" % "hive-service-rpc" % "2.1.0"
libraryDependencies += "org.apache.hive" % "hive-cli" % "2.3.5"
libraryDependencies += "org.apache.hive" % "hive-exec" % "2.3.4" excludeAll
ExclusionRule(organization = "org.pentaho")
除了依赖关系,我还通过火花提交中的--jars
从目录中移走了所有的jar,但这都不起作用。
完整的异常堆栈如下所示:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hive/service/rpc/thrift/TCLIService$Iface
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at com.data.stages.ExchangePartition.getHiveConnection(ExchangePartition.scala:30)
at com.data.stages.ExchangePartition.exchange(ExchangePartition.scala:44)
at com.partition.source.Pickup$.main(Pickup.scala:124)
at com.partition.source.Pickup.main(Pickup.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hive.service.rpc.thrift.TCLIService$Iface
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 16 more
谁能让我知道sbt文件中缺少哪些依赖项? 如果没有,那是我在这里犯的错误,因为在项目中具有相同库(依赖项)的相同类型的代码在Jave中工作,我不明白这里有什么问题吗? 非常感谢您的帮助。
答案 0 :(得分:4)
我不知道您正在为spark-submit.
使用客户端或群集模式
谁能让我知道我在sbt中缺少哪些依赖项 文件?
但是您添加的依赖关系是正确的。
libraryDependencies += "org.apache.hive" % "hive-jdbc" % "2.3.5"
我建议您使用uber jar,即将所有具有依赖项的jar打包为一个jar,这样就不会遗漏或遗漏任何东西。
如何制作超级罐子here
还要将此代码添加到您的驱动程序中...了解您的类路径中有哪些jar。
val urls = urlsinclasspath(getClass.getClassLoader).foreach(println)
def urlsinclasspath(cl: ClassLoader): Array[java.net.URL] = cl match {
case null => Array()
case u: java.net.URLClassLoader => u.getURLs() ++ urlsinclasspath(cl.getParent)
case _ => urlsinclasspath(cl.getParent)
}