运行数天后,Kudu Client失败,并出现异常

时间:2018-10-24 18:46:32

标签: apache-spark cloudera apache-kudu

我有一个运行的Scala / Spark / Kafka进程。当我第一次开始该过程时,我使用一个我在类之间共享的函数创建一个KuduClient对象。对于此作业,我只创建一次KuduClient,然后让该过程连续运行。我注意到几天后,我经常遇到异常。

我不太确定该怎么做。我认为也许一种选择是每天左右创建一个新的Kudu客户,但我不确定在这种情况下也该怎么做。

import org.apache.spark.SparkConf
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
import org.apache.spark.streaming.kafka010.KafkaUtils
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.json.JSONObject
import org.apache.kudu.client.KuduClient
import org.apache.log4j.Logger

object Thing extends Serializable {

  @transient lazy val client: KuduClient = createKuduClient(config)
  @transient lazy val logger: Logger = Logger.getLogger(getClass.getName)

  def main(args: Array[String]) {

    UtilFunctions.loadConfig(args) //I send back a config object.
    UtilFunctions.loadLogger() //factory method to load logger

    val props: Map[String, String] = setKafkaProperties()

    val topic = Set(config.getString("config.TOPIC_NAME"))

    val conf = new SparkConf().setMaster("local[2]").setAppName(config.getString("config.SPARK_APP_NAME"))
    val ssc = new StreamingContext(conf, Seconds(10))
    ssc.sparkContext.setLogLevel("ERROR")
    ssc.checkpoint(config.getString("config.SPARK_CHECKPOINT_NAME"))

    // val kafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, props, topic)
    val kafkaStream = KafkaUtils.createDirectStream[String, String](ssc, PreferConsistent, Subscribe[String, String](topic, props))
    val distRecordsStream = kafkaStream.map(record => (record.key(), record.value()))
    distRecordsStream.window(Seconds(10), Seconds(10))
    distRecordsStream.foreachRDD(distRecords => {
      logger.info(distRecords + " : " + distRecords.count())
      distRecords.foreach(record => {
        logger.info(record._2)
        MyClass.DoSomethingWithThisData(new JSONObject(record._2), client)
      })
    })

    ssc.start()
    ssc.awaitTermination()
  }

  def createKuduClient(config: Config): KuduClient = {
    var client: KuduClient = null
    try{
      client = new KuduClient.KuduClientBuilder(config.getString("config.KUDU_MASTER"))
        .defaultAdminOperationTimeoutMs(config.getInt("config.KUDU_ADMIN_TIMEOUT_S") * 1000)
        .defaultOperationTimeoutMs(config.getInt("config.KUDU_OPERATION_TIMEOUT_S") * 1000)
        .build()
    }
    catch {
      case e: Throwable =>
        logger.error(e.getMessage)
        logger.error(e.getStackTrace.toString)
        Thread.sleep(10000) //try to create a new kudu client
        client = createKuduClient(config)
    }
    client //return
  }

  def setKafkaProperties(): Map[String, String] = {


    val zookeeper = config.getString("config.ZOOKEEPER")
    val offsetReset = config.getString("config.OFFSET_RESET")
    val brokers = config.getString("config.BROKERS")
    val groupID = config.getString("config.GROUP_ID")
    val deserializer = config.getString("config.DESERIALIZER")
    val autoCommit = config.getString("config.AUTO_COMMIT")
    val maxPollRecords = config.getString("config.MAX_POLL_RECORDS")
    val maxPollIntervalms = config.getString("config.MAX_POLL_INTERVAL_MS")

    val props = Map(
      "bootstrap.servers" -> brokers,
      "zookeeper.connect" -> zookeeper,
      "group.id" -> groupID,
      "key.deserializer" -> deserializer,
      "value.deserializer" -> deserializer,
      "enable.auto.commit" -> autoCommit,
      "auto.offset.reset" -> offsetReset,
      "max.poll.records" -> maxPollRecords,
      "max.poll.interval.ms" -> maxPollIntervalms)
    props
  }

}

以下例外。我已删除了使用“ x”代替的IP地址

  

错误client.TabletClient:[对等   master-ip-xxx-xx-xxx-40.ec2.internal:7051]来自的意外异常   下游[id:0x42ba3f4d,/xxx.xx.xxx.39:36820 =>   ip-xxx-xxx-xxx-40.ec2.internal / xxx.xx.xxx.40:7051]   java.lang.RuntimeException:无法反序列化响应,   不兼容的RPC?错误是:步骤           在org.apache.kudu.client.KuduRpc.readProtobuf(KuduRpc.java:383)           在org.apache.kudu.client.Negotiator.parseSaslMsgResponse(Negotiator.java:282)           在org.apache.kudu.client.Negotiator.handleResponse(Negotiator.java:235)           在org.apache.kudu.client.Negotiator.messageReceived(Negotiator.java:229)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)处           在org.apache.kudu.client.shaded.org.jboss.netty.handler.timeout.ReadTimeoutHandler.messageReceived(ReadTimeoutHandler.java:184)上           在org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)处           在org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)处           在org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)处           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)处           在org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline $ DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)处           在org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)处           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)           在org.apache.kudu.client.shaded.org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)上           在org.apache.kudu.client.shaded.org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)           在org.apache.kudu.client.shaded.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)           在org.apache.kudu.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)           在org.apache.kudu.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker $ 1.run(DeadLockProofWorker.java:42)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)           在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)           在java.lang.Thread.run(Thread.java:748)

运行一段时间后,我也看到过类似的异常,others似乎归因于您的用户的打开文件句柄限制。

  

java.io.IOException:所有数据节点   DatanodeInfoWithStorage [xxx.xx.xxx.36:1004,DS-55c403c3-203a-4dac-b383-72fcdb686185,DISK]   不好正在中止...           在org.apache.hadoop.hdfs.DFSOutputStream $ DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1465)           在org.apache.hadoop.hdfs.DFSOutputStream $ DataStreamer.processDatanodeError(DFSOutputStream.java:1236)           在org.apache.hadoop.hdfs.DFSOutputStream $ DataStreamer.run(DFSOutputSt

这是否与打开文件过多有关?一旦文件达到限制,一种“清除”文件的方法?

0 个答案:

没有答案