Spark:Hive Insert overwrite抛出ClassNotFoundException

时间:2015-03-07 16:26:29

标签: hadoop hive apache-spark hiveql apache-spark-sql

我有这个代码将schemaRDD(人)保存到存储为镶木地板的人员表(person_parquet)

      hiveContext.sql("insert overwrite table person_parquet select * from person")

但它引发了一个错误:

java.lang.RuntimeException:org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.ClassNotFoundException:org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory     at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:399)     at org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:867)     at org.apache.hadoop.hive.ql.session.SessionState.getUserFromAuthenticator(SessionState.java:589)     在org.apache.hadoop.hive.ql.metadata.Table.getEmptyTable(Table.java:174)     在org.apache.hadoop.hive.ql.metadata.Table。(Table.java:116)     在org.apache.hadoop.hive.ql.metadata.Hive.newTable(Hive.java:2566)     在org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:917)     在org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1464)     在org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult $ lzycompute(InsertIntoHiveTable.scala:243)     在org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:137)     在org.apache.spark.sql.execution.Command $ class.execute(commands.scala:46)     在org.apache.spark.sql.hive.execution.InsertIntoHiveTable.execute(InsertIntoHiveTable.scala:51)     at org.apache.spark.sql.SQLContext $ QueryExecution.toRdd $ lzycompute(SQLContext.scala:425)     at org.apache.spark.sql.SQLContext $ QueryExecution.toRdd(SQLContext.scala:425)     在org.apache.spark.sql.SchemaRDDLike $ class。$ init $(SchemaRDDLike.scala:58)     在org.apache.spark.sql.SchemaRDD。(SchemaRDD.scala:108)     在org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:94)     在com.example.KafkaConsumer $$ anonfun $ main $ 2.apply(KafkaConsumer.scala:114)     在com.example.KafkaConsumer $$ anonfun $ main $ 2.apply(KafkaConsumer.scala:83)     在org.apache.spark.streaming.dstream.DStream $$ anonfun $ foreachRDD $ 1.apply(DStream.scala:529)     在org.apache.spark.streaming.dstream.DStream $$ anonfun $ foreachRDD $ 1.apply(DStream.scala:529)     在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1.apply $ mcV $ sp(ForEachDStream.scala:42)     在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1.apply(ForEachDStream.scala:40)     在org.apache.spark.streaming.dstream.ForEachDStream $$ anonfun $ 1.apply(ForEachDStream.scala:40)     在scala.util.Try $ .apply(Try.scala:161)     在org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)     在org.apache.spark.streaming.scheduler.JobScheduler $ JobHandler.run(JobScheduler.scala:171)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)     在java.lang.Thread.run(Thread.java:745) 引起:org.apache.hadoop.hive.ql.metadata.HiveException:java.lang.ClassNotFoundException:org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory     在org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:376)     at org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:381)     ......还有29个 引起:java.lang.ClassNotFoundException:org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory     在java.net.URLClassLoader $ 1.run(URLClassLoader.java:366)     在java.net.URLClassLoader $ 1.run(URLClassLoader.java:355)     at java.security.AccessController.doPrivileged(Native Method)     在java.net.URLClassLoader.findClass(URLClassLoader.java:354)     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)     at java.lang.Class.forName0(Native Method)     在java.lang.Class.forName(Class.java:274)     在org.apache.hadoop.hive.ql.metadata.HiveUtils.getAuthorizeProviderManager(HiveUtils.java:366)     ......还有30多个


  1. 我将hive-site.xml更改为此但仍引发相同的异常

    <property>hive.security.authenticator.manager</property>
    <value>org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator</value> 
    
    <property>hive.security.authorization.enabled</property>
    <value>false</value>
    
    <property>hive.security.authorization.manager</property
    <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvid‌​er</value>
    
  2. (与#1相同的hive-site.xml)当我在我的依赖项中添加了hive-exec 1.0时,它抛出了一个不同的异常(AbstractMethodError)

  3. (与#1相同的hive-site.xml)我尝试将hive-exec 0.13添加到我的依赖项中。在第一次运行(插入)期间,它仍然会抛出错误,但在第二次和后续插入时,它是成功的。

  4. 我正在使用Sandbox HDP 2.2(Hive 0.14.0.2.2.0.0-2041)和Spark 1.2.0。

    依赖关系:

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>0.13.0</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>1.2.0</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.10</artifactId>
            <version>1.2.0</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka_2.10</artifactId>
            <version>1.2.0</version>
        </dependency>
    
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.10</artifactId>
            <version>1.2.0</version>
        </dependency>
    

3 个答案:

答案 0 :(得分:0)

&#34; SQLStdConfOnlyAuthorizerFactory&#34;在hive 0.14.0版本(HIVE-8045)中添加了class,但Spark 1.2依赖于hive 0.13。您的hive-site.xml必须具有&#34; hive.security.authorization.manager&#34;设置为&#34; org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory&#34;并且你的类路径没有hive-exec 0.14 JAR,这就是它抛出ClassNotFoundException的原因。因此要么在类路径中包含你的hive-exec 0.14.0 JAR(以及在Spark自己的hive JAR之前),要么在hive-site.xml中将你的条目改为: -

<property>
  <name>hive.security.authorization.manager</name>
  <value>org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider</value>
</property>

不建议使用前者,因为hive版本不匹配可能会进一步出现类似问题

答案 1 :(得分:0)

更改

的值

hive.security.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider

的工作。

更改了hive-site.xml

答案 2 :(得分:0)

我认为这是因为你在类路径上有重复的jar。