Nifi中的PutHiveStreaming处理器抛出NPE

时间:2019-01-25 16:55:56

标签: hive apache-nifi

我正在调试遵循官方PutHiveStreaming处理器的HiveProcessor,但它写入的是Hive 2.x而不是3.x。该流在Nifi群集1.7.1中运行。尽管会发生这种异常,但数据仍会写入Hive。

例外是:


java.lang.NullPointerException: null
    at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:77)
    at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:54)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabases(HiveMetaStoreClient.java:1147)
    at org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient.isOpen(HiveClientCache.java:471)
    at sun.reflect.GeneratedMethodAccessor1641.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:169)
    at com.sun.proxy.$Proxy308.isOpen(Unknown Source)
    at org.apache.hive.hcatalog.common.HiveClientCache.get(HiveClientCache.java:270)
    at org.apache.hive.hcatalog.common.HCatUtil.getHiveMetastoreClient(HCatUtil.java:558)
    at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:95)
    at org.apache.hive.hcatalog.streaming.StrictJsonWriter.<init>(StrictJsonWriter.java:82)
    at org.apache.hive.hcatalog.streaming.StrictJsonWriter.<init>(StrictJsonWriter.java:60)
    at org.apache.nifi.util.hive.HiveWriter.lambda$getRecordWriter$0(HiveWriter.java:91)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.nifi.util.hive.HiveWriter.getRecordWriter(HiveWriter.java:91)
    at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:75)
    at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:46)
    at org.apache.nifi.processors.hive.PutHive2Streaming.makeHiveWriter(PutHive2Streaming.java:1152)
    at org.apache.nifi.processors.hive.PutHive2Streaming.getOrCreateWriter(PutHive2Streaming.java:1065)
    at org.apache.nifi.processors.hive.PutHive2Streaming.access$1000(PutHive2Streaming.java:114)
    at org.apache.nifi.processors.hive.PutHive2Streaming$1.lambda$process$2(PutHive2Streaming.java:858)
    at org.apache.nifi.processor.util.pattern.ExceptionHandler.execute(ExceptionHandler.java:127)
    at org.apache.nifi.processors.hive.PutHive2Streaming$1.process(PutHive2Streaming.java:855)
    at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2211)
    at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2179)
    at org.apache.nifi.processors.hive.PutHive2Streaming.onTrigger(PutHive2Streaming.java:808)
    at org.apache.nifi.processors.hive.PutHive2Streaming.lambda$onTrigger$4(PutHive2Streaming.java:672)
    at org.apache.nifi.processor.util.pattern.PartialFunctions.onTrigger(PartialFunctions.java:114)
    at org.apache.nifi.processor.util.pattern.RollbackOnFailure.onTrigger(RollbackOnFailure.java:184)
    at org.apache.nifi.processors.hive.PutHive2Streaming.onTrigger(PutHive2Streaming.java:672)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

我也想重现错误。使用TestRunners.newTestRunner(processor);可以找到它吗?我指的是Hive 3.x的测试用例 https://github.com/apache/nifi/blob/ea9b0db2f620526c8dd0db595cf8b44c3ef835be/nifi-nar-bundles/nifi-hive-bundle/nifi-hive-processors/src/test/java/org/apache/nifi/processors/hive/TestPutHiveStreaming.java

另一种方法是在本地运行Hive 2.x和Nifi容器。但是然后我必须运行docker cp来通过mvn复制nar软件包,并按照本博客的描述从intellij附加远程JVM。 https://community.hortonworks.com/articles/106931/nifi-debugging-tutorial.html

有人做了类似的事情吗?还是有一种更简便的方法来调试自定义处理器?

2 个答案:

答案 0 :(得分:1)

这是一个红色鲱鱼错误,Hive方面存在一些问题,无法获取自己的IP地址或主机名,因此会定期发出此错误。但是,正如您所说的那样,数据不会写入Hive。我认为这不会造成任何实际问题。

仅出于完整性考虑,在Apache NiFi中,PutHiveStreaming构建为可与Hive 1.2.x(而非Hive 2.x)配合使用。当前没有特定的Hive 2.x处理器,我们尚未确定Hive 1.2.x处理器是否可与Hive 2.x兼容。

对于调试,如果您可以在容器中运行Hive并公开Metastore端口(我相信默认值为9083),那么您应该能够使用TestRunners之类的工具创建集成测试并在本地运行NiFi从您的IDE。这就是对外部系统(例如MongoDB或Elasticsearch)执行其他集成测试的方式。

Hive测试套件中有一个MiniHS2类,用于集成测试,但是它不在发布的工件中,因此很遗憾,我们不得不对真正的Hive实例运行测试。

答案 1 :(得分:0)

hcatalog.hive.client.cache.disabled设置为true后,NPE不显示

Kafka Connect也推荐此设置。

来自Kafka Connect文档https://docs.confluent.io/3.0.0/connect/connect-hdfs/docs/hdfs_connector.html

  

由于连接器任务长期运行,因此与Hive Metastore的连接   保持打开状态,直到任务停止。在默认配置单元中   配置,重新连接到Hive Metastore会创建一个新的   连接。当任务数量很大时,可能   重试可能导致打开的连接数超过最大数量   操作系统中允许的连接。因此建议   在hive.xml中将hcatalog.hive.client.cache.disabled设置为true。

Max Concurrent Tasks中的PutHiveStreaming设置为1以上时,此属性会自动设置为false

Nifi的文档也解决了这个问题。

  

NiFi PutHiveStreaming有一个连接池,因此   多线程将hcatalog.hive.client.cache.disabled设置为true   将允许每个连接设置为自己的会话,而无需依赖   缓存。

参考: https://community.hortonworks.com/content/supportkb/196628/hive-client-puthivestreaming-fails-against-partiti.html