火花流广播变量到自定义接收器

时间:2019-05-15 20:13:24

标签: scala apache-spark spark-streaming

我创建了一个使用Spark-Streaming和自定义接收方Google Pub / Sub的应用程序。

我达到了性能极限,并希望在不进行处理的情况下删除消息。我有一个想法store(),用于阅读邮件 我使用了 apache / bahir 接收器

                val pullResponse = client.projects().subscriptions().pull(subscriptionFullName, pullRequest).execute()
                val receivedMessages = pullResponse.getReceivedMessages.asScala.toList
                Utils.LOG.info(s"receivedMessages from PUB/SUB ${receivedMessages.size}")
                rateLimiter.acquire(receivedMessages.size)
                var factor: Int = 0
                if (dropFactorBroad != null) {
                    factor = dropFactorBroad.value
                } else {
                    Utils.LOG.info("dropFactorBroad is null")
                }
                val endIndex = if (factor > receivedMessages.length) receivedMessages.length else factor
                val messagesToStore = receivedMessages.slice(0, receivedMessages.length - endIndex)

                store(messagesToStore.map(x => {
                      val sm = new SparkPubsubMessage
                      sm.message = x.getMessage
                      sm
                  })
                  .iterator)

                val ackRequest = new AcknowledgeRequest()
                ackRequest.setAckIds(receivedMessages.map(x => x.getAckId).asJava)
                client.projects().subscriptions().acknowledge(subscriptionFullName, ackRequest).execute()

dropFactorBroad-是广播变量,它在每个onBatchCompleted上更新(不再持久化并再次创建)

它不起作用,我得到

java.lang.NullPointerException
    at com.mag.ingester.ReceiverDropFactorBroadcaster.value(ReceiverDropFactorBroadcaster.scala:20)
    at com.mag.pubSubReceiver.PubsubReceiver.receive(PubsubInputDStream.scala:260)
    at com.mag.pubSubReceiver.PubsubReceiver$$anon$1.run(PubsubInputDStream.scala:244)

ReceiverDropFactorBroadcaster是dropFactorBroad

如何控制收货商店? 我应该杀死接收者更改变量并重新启动吗? (怎么办?)

谢谢

0 个答案:

没有答案