我正在调试遵循官方PutHiveStreaming处理器的HiveProcessor,但它写入的是Hive 2.x而不是3.x。该流在Nifi群集1.7.1中运行。尽管会发生这种异常,但数据仍会写入Hive。
例外是:
java.lang.NullPointerException: null
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1147)
at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:471)
at sun.reflect.GeneratedMethodAccessor1641.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
at com.sun.proxy.$Proxy308.isOpen(Unknown Source)
at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:270)
at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:558)
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:95)
at org.apache.hive.hcatalog.streaming.StrictJsonWriter.<init>(StrictJsonWriter.java:82)
at org.apache.hive.hcatalog.streaming.StrictJsonWriter.<init>(StrictJsonWriter.java:60)
at org.apache.nifi.util.hive.HiveWriter.lambda$getRecordWriter$0(HiveWriter.java:91)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.nifi.util.hive.HiveWriter.getRecordWriter(HiveWriter.java:91)
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:75)
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
at org.apache.nifi.processors.hive.PutHive2Streaming.makeHiveWriter(PutHive2Streaming.java:1152)
at org.apache.nifi.processors.hive.PutHive2Streaming.getOrCreateWriter(PutHive2Streaming.java:1065)
at org.apache.nifi.processors.hive.PutHive2Streaming.access$1000(PutHive2Streaming.java:114)
at org.apache.nifi.processors.hive.PutHive2Streaming$1.lambda$process$2(PutHive2Streaming.java:858)
at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
at org.apache.nifi.processors.hive.PutHive2Streaming$1.process(PutHive2Streaming.java:855)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2211)
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2179)
at org.apache.nifi.processors.hive.PutHive2Streaming.onTrigger(PutHive2Streaming.java:808)
at org.apache.nifi.processors.hive.PutHive2Streaming.lambda$onTrigger$4(PutHive2Streaming.java:672)
at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
at org.apache.nifi.processors.hive.PutHive2Streaming.onTrigger(PutHive2Streaming.java:672)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
我也想重现错误。使用TestRunners.newTestRunner(processor);
可以找到它吗?我指的是Hive 3.x的测试用例
https://github.com/apache/nifi/blob/ea9b0db2f620526c8dd0db595cf8b44c3ef835be/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestPutHiveStreaming.java
另一种方法是在本地运行Hive 2.x和Nifi容器。但是然后我必须运行docker cp
来通过mvn复制nar软件包,并按照本博客的描述从intellij附加远程JVM。
https://community.hortonworks.com/articles/106931/nifi-debugging-tutorial.html
有人做了类似的事情吗?还是有一种更简便的方法来调试自定义处理器?
答案 0 :(得分:1)
这是一个红色鲱鱼错误,Hive方面存在一些问题,无法获取自己的IP地址或主机名,因此会定期发出此错误。但是,正如您所说的那样,数据不会写入Hive。我认为这不会造成任何实际问题。
仅出于完整性考虑,在Apache NiFi中,PutHiveStreaming构建为可与Hive 1.2.x(而非Hive 2.x)配合使用。当前没有特定的Hive 2.x处理器,我们尚未确定Hive 1.2.x处理器是否可与Hive 2.x兼容。
对于调试,如果您可以在容器中运行Hive并公开Metastore端口(我相信默认值为9083),那么您应该能够使用TestRunners
之类的工具创建集成测试并在本地运行NiFi从您的IDE。这就是对外部系统(例如MongoDB或Elasticsearch)执行其他集成测试的方式。
Hive测试套件中有一个MiniHS2类,用于集成测试,但是它不在发布的工件中,因此很遗憾,我们不得不对真正的Hive实例运行测试。
答案 1 :(得分:0)
将hcatalog.hive.client.cache.disabled
设置为true后,NPE不显示
Kafka Connect也推荐此设置。
来自Kafka Connect文档https://docs.confluent.io/3.0.0/connect/connect-hdfs/docs/hdfs_connector.html
由于连接器任务长期运行,因此与Hive Metastore的连接 保持打开状态,直到任务停止。在默认配置单元中 配置,重新连接到Hive Metastore会创建一个新的 连接。当任务数量很大时,可能 重试可能导致打开的连接数超过最大数量 操作系统中允许的连接。因此建议 在hive.xml中将hcatalog.hive.client.cache.disabled设置为true。
当Max Concurrent Tasks
中的PutHiveStreaming
设置为1以上时,此属性会自动设置为false
Nifi的文档也解决了这个问题。
NiFi
PutHiveStreaming
有一个连接池,因此 多线程将hcatalog.hive.client.cache.disabled设置为true 将允许每个连接设置为自己的会话,而无需依赖 缓存。