插入期间Spark Databricks集群noclassdeffounderror

时间:2019-06-18 21:43:29

标签: apache-spark hive noclassdeffounderror databricks hive-serde

使用Databricks Spark集群进行实验。在Hive数据库中创建表时,我第一次遇到以下错误。

19/06/18 21:34:17 ERROR SparkExecuteStatementOperation: Error running hive query: 
org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: org/joda/time/ReadWritableInstant
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:296)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2$$anonfun$run$2.apply$mcV$sp(SparkExecuteStatementOperation.scala:182)
    at org.apache.spark.sql.hive.thriftserver.server.SparkSQLUtils$class.withLocalProperties(SparkSQLOperationManager.scala:190)

随后尝试创建相同的表(不重新启动群集)时,我得到了...

org.apache.hive.service.cli.HiveSQLException: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyPrimitiveObjectInspectorFactory
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:296)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2$$anonfun$run$2.apply$mcV$sp(SparkExecuteStatementOperation.scala:182)
    at org.apache.spark.sql.hive.thriftserver.server.SparkSQLUtils$class.withLocalProperties(SparkSQLOperationManager.scala:190)
    at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:44)

从beeline(客户端),我收到以下错误...。基本上是同一件事。

13: jdbc:spark://dbc-e1ececb9-10d2.cloud.data> create table test_dnax_db.sample2 (name2 string);
Error: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: java.lang.NoClassDefFoundError: org/joda/time/ReadWritableInstant, Query: create table test_dnax_db.sample2 (name2 string). (state=HY000,code=500051)
13: jdbc:spark://dbc-e1ececb9-10d2.cloud.data> create table test_dnax_db.sample2 (name2 string);
Error: [Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyPrimitiveObjectInspectorFactory, Query: create table test_dnax_db.sample2 (name2 string). (state=HY000,code=500051)

我尝试使用databricks的libraries功能上传相关的joda-time jars和serde jars。另外,我还设置了spark属性spark.driver.extraClassPath(鉴于错误来自Spark驱动程序而不是工作线程)。都没有帮助。我确实在主机/ databricks / hive和/ databricks / jars文件夹中看到可用的从属jar。

我也尝试过设置HADOOP_CLASSPATH之类的环境变量,但运气不佳。

臭名昭著的Databricks论坛是毫无用处的,因为它们根本没有被策展(与杂物或类似的商业产品相比)。

欢迎提出任何建议。

我可以使用location关键字以及从元存储中现有表中进行查询来成功创建数据库。

编辑:

我怀疑SparkExecuteStatementOperation(Spark集群中sql执行的节俭入口类,在驱动程序上运行)可能正在使用与应用程序不同的其他类加载器。我在应用程序类静态块中添加了此代码,我知道该块已初始化,并且看不到ClassNotFoundException,即jar可用于应用程序。但是底层驱动程序看不到相关的jar。

static {
        try {
            Class<?> aClass = Class.forName("org.joda.time.ReadWritableInstant");
            }
        } catch (ClassNotFoundException e) {
            LOG.warn("Unable to find ReadWritableInstant class", e);
        }
}

0 个答案:

没有答案