我正在尝试创建两个数据框并使用dataframe.join方法将其加入。
这是scala代码:
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkConf
object RuleExecutor {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName(AppConstants.AppName).setMaster("local")
val sparkSession = SparkSession.builder().appName(AppConstants.AppName).config(sparkConf).enableHiveSupport().getOrCreate()
import sparkSession.sql
sql(s"CREATE DATABASE test")
sql ("CREATE TABLE test.box_width (id INT, width INT)") // Create table box_width
sql ("INSERT INTO test.box_width VALUES (1,1), (2,2)") // Insert data in box_width
sql ("CREATE TABLE test.box_length (id INT, length INT)") // Create table box_length
sql ("INSERT INTO test.box_length VALUES (1,10), (2,20)") // Insert data in box_length
val widthDF = sql("select * from test.box_width") // Get DF for table box_width
val lengthDF = sql("select * from test.box_length") // Get DF for table box_length
val dimensionDF = lengthDF.join(widthDF, "id"); // Joining
dimensionDF.show();
}
}
但是在运行代码时,出现以下错误:
Exception in thread "main" java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)…..
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)……
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)……
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)……
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)…
Caused by: org.datanucleus.api.jdo.exceptions.ClassNotPersistenceCapableException: The class "org.apache.hadoop.hive.metastore.model.MVersionTable" is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found.
NestedThrowables:
org.datanucleus.exceptions.ClassNotPersistableException: The class "org.apache.hadoop.hive.metastore.model.MVersionTable" is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found.
at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:473)……
Caused by: org.datanucleus.exceptions.ClassNotPersistableException: The class "org.apache.hadoop.hive.metastore.model.MVersionTable" is not persistable. This means that it either hasnt been enhanced, or that the enhanced version of the file is not in the CLASSPATH (or is hidden by an unenhanced version), or the Meta-Data/annotations for the class are not found.
at org.datanucleus.ExecutionContextImpl.assertClassPersistable(ExecutionContextImpl.java:5113)……
我使用的版本是
斯卡拉= 2.11
Spark-Hive = 2.2.2
Maven-org-spark-project-hive_hive-metastore = 1.x
DataNucleus = 5.x
如何解决此问题? complete log list of dependencies
谢谢
答案 0 :(得分:1)
首先,除非在编写Scala代码时一行中有多个表达式,否则您不再需要在行尾使用;
。
第二,我检查了您的日志,发现了15个错误,主要是数据库表不存在或找不到配置单元。因此,我认为这些实例无法正常运行。在运行Spark作业之前,您可以确保正确设置所有这些内容(Hive,MySql DB)吗?