风暴:暴风雨 - hdfs hdfs在24小时后失败

时间:2016-03-04 10:24:04

标签: hadoop apache-kafka apache-storm hadoop2 hadoop-streaming

从kafka读取并写入hadoop hdfs的我的风暴拓扑在24小时后才完全失败!!

我怀疑问题是,拓扑无法更新令牌/找不到要更新的密钥记录。请分享您的想法并帮助我解决问题。

请找到用于配置hdfs bolt ..

的代码

配置对象:

//building a 'map' with hdfs related configuration for key tab
Map<String, Object> hdfsSecConfigMap = new HashMap<String, Object>();
hdfsSecConfigMap.put("hdfs.keytab.file", ktPath);
hdfsSecConfigMap.put("hdfs.kerberos.principal", ktPrincipal);

//building a 'map' with hbase related configuration
Map<String, Object> hbaseConfigMap = new HashMap<String, Object>();
hbaseConfigMap.put("hbase.rootdir", hbaseRootDir);
hbaseConfigMap.put("storm.keytab.file", ktPath);
hbaseConfigMap.put("storm.kerberos.principal", ktPrincipal);

Config configured = new Config();
configured.setDebug(true);
configured.put(hdfsConfKey, hdfsSecConfigMap);
configured.put(hbaseConfKey, hbaseConfigMap);
configured.setNumWorkers(2);
configured.setMaxSpoutPending(300);
configured.setNumAckers(30);
configured.setMessageTimeoutSecs(1200);

configured.put(HdfsSecurityUtil.STORM_KEYTAB_FILE_KEY, ktPath);
configured.put(HdfsSecurityUtil.STORM_USER_NAME_KEY, ktPrincipal);

configured.put(HBaseSecurityUtil.STORM_KEYTAB_FILE_KEY, ktPath);
configured.put(HBaseSecurityUtil.STORM_USER_NAME_KEY, ktPrincipal);

检索hdfs bolt

HdfsBolt hdfsbolt = new HdfsBolt()
        .withFsUrl(hdfsuri)
        .withRecordFormat(recFormat)
        .withFileNameFormat(fileNameWithPath)
        .withRotationPolicy(fileRotationSize)
        .withSyncPolicy(syncPolicy)
        .withConfigKey(secBypassConfigKey);

下面的TopologyBuilder设置

builder.setBolt(“hdfsBolt", avroHDFSBolt, 1)
        .setNumTasks(1)
        .shuffleGrouping(“kafka-spout");

面临的例外情况如下:

java.io.IOException: IOException flush:java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: “**********"; destination host is: “***************":8020;
        at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2082) ~[stormjar.jar:?]
        at org.apache.hadoop.hdfs.DFSOutputStream.hsync(DFSOutputStream.java:1969) ~[stormjar.jar:?]
        at org.apache.hadoop.hdfs.client.HdfsDataOutputStream.hsync(HdfsDataOutputStream.java:95) ~[stormjar.jar:?]
        at org.apache.storm.hdfs.bolt.HdfsBolt.execute(HdfsBolt.java:100) [stormjar.jar:?]
        at backtype.storm.daemon.executor$fn__3697$tuple_action_fn__3699.invoke(executor.clj:670) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.daemon.executor$mk_task_receiver$fn__3620.invoke(executor.clj:426) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.disruptor$clojure_handler$reify__3196.onEvent(disruptor.clj:58) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.daemon.executor$fn__3697$fn__3710$fn__3761.invoke(executor.clj:808) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at backtype.storm.util$async_loop$fn__544.invoke(util.clj:475) [storm-core-0.10.0.2.3.4.0-3485.jar:0.10.0.2.3.4.0-3485]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]

1 个答案:

答案 0 :(得分:0)

在针对我们的hadoop群集中使用的正确版本的hadoop重新构建我的代码/应用程序后,我能够解决此问题。

由于版本不匹配导致出现此问题,并在使用正确的版本重新构建后修复了问题!!