设置hive.exec.pre.hooks时发生ClassNotFoundException

时间:2018-08-12 10:08:30

标签: java hadoop hive hook

我正在按照此文档进行配置单元挂钩:

http://dharmeshkakadia.github.io/hive-hook/

但是当show tables

时出现此错误
2018-08-12 09:57:38,122 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: hive.exec.pre.hooks Class not found: HiveExampleHook
2018-08-12 09:57:38,122 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(HiveExampleHook)
java.lang.ClassNotFoundException: HiveExampleHook
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:100)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:64)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
    at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
    at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: </PERFLOG method=Driver.execute start=1534067858120 end=1534067858122 duration=2 from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: Completed executing command(queryId=hive_20180812095757_e6516d83-ddc9-4f82-8151-def7e7f1eb37); Time taken: 0.002 seconds
2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: </PERFLOG method=releaseLocks start=1534067858122 end=1534067858122 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,130 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-315]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(HiveExampleHook)
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
    at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
    at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: HiveExampleHook
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:100)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:64)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
    ... 11 more

我确定最后一步add jar target/Hive-hook-example-1.0.jar;是错误的。

我尝试了以下操作:

  1. 我将jar文件放入hdfs / user / hive /中:

    添加jar hdfs:///user/hive/Hive-hook-example-1.0.jar;

  2. 我还将“ Hive辅助JAR目录”设置为 在Hiveserver2节点中的/home/centos/HiveExampleHook/target/Hive-hook-example-1.0.jar并重新启动Hive和beeline。

  3. 将jar文件复制到/opt/cloudera/parcels/CDH/jars/

  4. 将jar文件复制到/opt/cloudera/parcels/CDH/lib/hive/lib/

没有帮助。

有什么主意吗?

更新1:

如果我这样做LIST JARS;,则会显示

+----------------------------------------------------+--+
|                      resource                      |
+----------------------------------------------------+--+
| /tmp/3fe67bb1-5cfd-427f-8faa-cab6524afeb3_resources/Hive-hook-example-1.0.jar |
+----------------------------------------------------+--+

我也尝试了两种方法来做CREATE FUNCTION

CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook';
INFO  : Compiling command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f); Time taken: 0.002 seconds
INFO  : Executing command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook'
INFO  : Starting task [Stage-0:FUNC] in serial mode
ERROR : FAILED: Class HiveExampleHook not found
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
INFO  : Completed executing command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f); Time taken: 0.003 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)

和...

CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar';
INFO  : Compiling command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401); Time taken: 0.004 seconds
INFO  : Executing command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar'
INFO  : Starting task [Stage-0:FUNC] in serial mode
INFO  : converting to local hdfs:///user/hive/Hive-hook-example-1.0.jar
INFO  : Added [/tmp/3fe67bb1-5cfd-427f-8faa-cab6524afeb3_resources/Hive-hook-example-1.0.jar] to class path
INFO  : Added resources: [hdfs:///user/hive/Hive-hook-example-1.0.jar]
ERROR : FAILED: Class HiveExampleHook not found
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
INFO  : Completed executing command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401); Time taken: 0.03 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)

因此很明显它可以找到jar,但找不到类名。我说的对吗?

更新2:

我尝试过:

[Hive-hook-example]# java -cp `pwd`/target/Hive-hook-example-1.0.jar HiveExampleHook

仍然得到这个:

Error: Could not find or load main class HiveExampleHook

我相信这是我犯的一些愚蠢的错误。

更新3:

好,我明白了。 您必须使用配置单元CLI而不是蜂巢式命令行才能正常工作。

hive> add jar hdfs:///user/hive/Hive-hook-example-1.0.jar;
add jar hdfs:///user/hive/Hive-hook-example-1.0.jar
converting to local hdfs:///user/hive/Hive-hook-example-1.0.jar
Added [/tmp/0a90132d-70cd-4ef0-b4cd-e75dc823e5ca_resources/Hive-hook-example-1.0.jar] to class path
Added resources: [hdfs:///user/hive/Hive-hook-example-1.0.jar]
hive> set hive.exec.pre.hooks=HiveExampleHook;
set hive.exec.pre.hooks=HiveExampleHook
hive> show tables;
show tables
Hello from the hook !!
OK
test1
Time taken: 0.023 seconds, Fetched: 5 row(s)

那么问题是,如何在beeline中运行它?因为不推荐使用Hive CLI。

更新4:

我决定这样做: enter image description here

跑了beeline,看到了:

2018-08-12 16:39:13,286 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-60]: <PERFLOG method=PreHook.HiveExampleHook from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 16:39:13,286 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-60]: </PERFLOG method=PreHook.HiveExampleHook start=1534091953286 end=1534091953286 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2

这是一些进步,尽管我不确定这意味着什么以及是否已上课。我看不到任何输出。

1 个答案:

答案 0 :(得分:0)

对于beeline,添加jar时必须使用HDFS路径。请记住,beeline只是一个JDBC CLI,因此当您将add jar与本地路径一起使用时,它具有对本地路径的引用,集群上运行的hive会话无法访问它。

(感谢https://twitter.com/quanghoc/status/1028671393376874496的帮助。我是您引用的博客的作者。)