Google PubSub订阅无法从StatusCode.UNAVAILABLE [code = 8a75]错误

时间:2018-03-21 12:03:11

标签: python google-cloud-pubsub

例外:

  

线程中的异常Thread-ConsumeBidirectionalStream:grpc._channel._Rendezvous:< _Reddezvous的RPC终止于(StatusCode.UNAVAILABLE,该服务无法满足您的请求。请再试一次。[code = 8a75])>

     

上述异常是导致以下异常的直接原因:

     

google.api_core.exceptions.ServiceUnavailable:503该服务无法满足您的要求。请再试一次。 [代码= 8a75]

我正在尝试构建一个大致遵循Google端到端示例(docs | code)的IoT原型,并且我在订阅者中遇到错误当队列中没有消息时。当用户在大约一分钟后首次启动空队列以及处理任意数量的消息之后以及队列清空后一分钟左右时,都会发生这种情况。

我在StackOverflow找到了解决方法,但无法使其正常运行。 所以我的问题是如何让此解决方案政策正常运行,因为它似乎只是隐藏错误 - 我的订阅者仍然挂起并且不会处理任何其他消息。

代码的相关部分如下所示:

from google.cloud import pubsub
import google.cloud.pubsub_v1.subscriber.message as Message

from google.cloud.pubsub_v1.subscriber.policy import thread
import grpc

class WorkaroundPolicy(thread.Policy):
    def on_exception(self, exception):
        # If we are seeing UNAVALABLE then we need to retry (so return None)
        unavailable = grpc.StatusCode.UNAVAILABLE

        if isinstance(exception, ServiceUnavailable):
            logger.warning('Ignoring google.api_core.exceptions.ServiceUnavailable exception: {}'.format(exception))
            return
        elif getattr(exception, 'code', lambda: None)() in [unavailable]:
            logger.warning('Ignoring grpc.StatusCode.UNAVAILABLE (Orginal exception: {})'.format(exception))
            return

        # For anything else fall back on the parent class behaviour
        super(WorkaroundPolicy, self).on_exception(exception)


# Other imports and code ommitted for brevity

def callback(msg: Message):
    try:
        data = json.loads(msg.data)
    except ValueError as e:
        logger.error('Loading Payload ({}) threw an Exception: {}.'.format(msg.data, e))
        # For the prototype, if we can't read it, then discard it
        msg.ack()
        return

    device_project_id = msg.attributes['projectId']
    device_registry_id = msg.attributes['deviceRegistryId']
    device_id = msg.attributes['deviceId']
    device_region = msg.attributes['deviceRegistryLocation']

    self._update_device_config(
      device_project_id,
      device_region,
      device_registry_id,
      device_id,
      data)

    msg.ack()

def run(self, project_name, subscription_name):   
    # Specify a workaround policy to handle StatusCode.UNAVAILABLE [code=8a75] error (but may get CPU issues)
    #subscriber = pubsub.SubscriberClient(policy_class = WorkaroundPolicy)

    # Alternatively, instantiate subscriber without the workaround to see full exception stack
    subscriber = pubsub.SubscriberClient()

    subscription = subscriber.subscribe(subscription_path, callback)

    subscription.future.result()

    while True:
        time.sleep(60)

如果有帮助,可以在GitHub中找到完整版本。

堆栈跟踪/命令行输出(无策略解决方法)

(venv) Freds-MBP:iot-subscriber-issue Fred$ python Controller.py \
     --project_id=xyz-tests \
     --pubsub_subscription=simple-mqtt-controller \
     --service_account_json=/Users/Fred/_dev/gcp-credentials/simple-mqtt-controller-service-account.json

2018-03-21 09:36:20,975 INFO Controller Creating credentials from JSON Key File: "/Users/Fred/_dev/gcp-credentials/simple-mqtt-controller-service-account.json"...
2018-03-21 09:36:20,991 INFO Controller Creating service from discovery URL: "https://cloudiot.googleapis.com/$discovery/rest?version=v1"...
2018-03-21 09:36:20,992 INFO googleapiclient.discovery URL being requested: GET https://cloudiot.googleapis.com/$discovery/rest?version=v1
2018-03-21 09:36:21,508 INFO Controller Creating subscriber for project: "xyz-tests" and subscription: "simple-mqtt-controller"...
2018-03-21 09:36:23,200 INFO Controller Listening for messages on projects/xyz-tests/subscriptions/simple-mqtt-controller...

# This then occurs typically 60 seconds or so (sometimes up to 2 mins) later:

Exception in thread Thread-ConsumeBidirectionalStream:
Traceback (most recent call last):
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 76, in next
    return six.next(self._wrapped)
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/grpc/_channel.py", line 347, in __next__
    return self._next()
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/grpc/_channel.py", line 341, in _next
    raise self
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, The service was unable to fulfill your request. Please try again. [code=8a75])>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 349, in _blocking_consume
    for response in responses:
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 476, in _pausable_iterator
    yield next(iterator)
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 78, in next
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.ServiceUnavailable: 503 The service was unable to fulfill your request. Please try again. [code=8a75]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 363, in _blocking_consume
    request_generator, response_generator)
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/cloud/pubsub_v1/subscriber/_consumer.py", line 275, in _stop_request_generator
    if not response_generator.done():
AttributeError: '_StreamingResponseIterator' object has no attribute 'done'

^C

Traceback (most recent call last):
  File "Controller.py", line 279, in <module>
    if __name__ == '__main__':
  File "Controller.py", line 274, in main
    try:
  File "Controller.py", line 196, in run

  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/cloud/pubsub_v1/futures.py", line 111, in result
    err = self.exception(timeout=timeout)
  File "/Users/Fred/_dev/datacentricity-public-samples/iot-subscriber-issue/venv/lib/python3.6/site-packages/google/cloud/pubsub_v1/futures.py", line 133, in exception
    if not self._completed.wait(timeout=timeout):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 551, in wait
    signaled = self._cond.wait(timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 295, in wait
    waiter.acquire()
KeyboardInterrupt
(venv) Freds-MBP:iot-subscriber-issue Fred$

这似乎是一个持续的问题,在GitHub中查看以下问题(所有这些问题现已关闭):

Pub/Sub Subscriber does not catch & retry UNAVAILABLE errors #4234

Pub/Sub has no way to track errors from the subscriber thread. #3888

PubSub: set_exception can only be called once. can still occur "erroneously" #4463

我还在StackOverflow上找到了以下项目:

Google PubSub python client returning StatusCode.UNAVAILABLE 于2017年10月发布,答案是我在上面的代码中尝试过的策略类解决方法。虽然,至少在我的代码中,提出的答案只隐藏错误但不允许处理新消息。

Google Pubsub Python Client library subscriber crashes randomly 似乎是同一个原因,但用例不同。答案(由提问者提供)表明升级到最新的google-cloud解决了这个问题,但我已经在使用最新版本的google-api-core(1.1.0)和google-cloud-pubsub(0.32.1)等。

Google Pub/Sub Subscriber not receiving messages after a while 可能有关系,但没有确凿的答案。

其他信息: 操作系统:Mac OS X El Capitan 10.11.6 Python 3.6.2在virtualenv 15.1.0中运行

(部分)pip冻结输出:

google-api-core==1.1.0 
google-api-python-client==1.6.5 
google-auth==1.4.1 
google-auth-httplib2==0.0.3 
google-cloud-pubsub==0.32.1 
googleapis-common-protos==1.5.3 
grpc-google-iam-v1==0.11.4 
grpcio==1.10.0 
httplib2==0.10.3 
paho-mqtt==1.3.1 

1 个答案:

答案 0 :(得分:1)

我看到了与你类似的行为。除了解决方案策略之外,该进程始终挂起而不会抛出任何异常。

更奇怪的是,它是在同事机器而不是我的机器上工作。我已经尝试了另外两台机器和docker容器并且都挂了。

接下来,我要尝试将所有内容删除到最小的容器化示例,我可以将其带到Google,并希望从他们那里得到一些反馈。

如果你在过渡期间学到了什么,请告诉我。