Google PubSub返回google.gax.errors.GaxError:GaxError RPC失败导致... StatusCode.UNAVAILABLE

时间:2016-11-09 22:47:08

标签: google-cloud-platform google-cloud-pubsub

在我们的某个分布式系统上发生事件后,我们正尝试对现有主题进行简单发布。

代码如下:

try:
  dat = data.encode('utf-8')
  topic.publish(dat)
except:
  <code to recover>

如果我们捕获所有的除外并打印回溯,我们得到:

  

google.gax.errors.GaxError:GaxError(RPC失败,由   &lt; _以(StatusCode.UNAVAILABLE结尾)终止的RPC的Rendezvous,   {&#34;已创建&#34;:&#34; @ 1478711654.067744009&#34;,&#34;说明&#34;:&#34;安全阅读   失败&#34;&#34;文件&#34;:&#34; SRC /核心/ LIB /安全/运输/ secure_endpoint.c&#34;&#34; file_line&#34; 157&#34; grpc_status&#34;:14,&#34; referenced_errors&#34;:[{&#34;创建&#34;:&#34; @ 1478711654.067706801&#34;&#34;描述&#34;:&#34 ; EOF&#34;&#34;文件&#34;:&#34; SRC /核心/ LIB / iomgr / tcp_posix.c&#34;&#34; file_line&#34; 235}]})&GT;

(以下完整错误)

看看http://gcloud-python.readthedocs.io/en/latest/pubsub-topic.html#google.cloud.pubsub.topic.Topic.publish,看起来这个GAX错误并不是我们应该关注的问题。但是,如果我们捕获错误并使用指数退避重试,这通常会第二次运行。

我找到了this discussion,虽然它在_gax_python中说明了潜在的错误,但它似乎并不相关。关于我们在这里做错了什么的想法?

完整错误:

458    Traceback (most recent call last):
   459      File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
   460        self.run()
   461      File "/usr/lib/python3.5/threading.py", line 862, in run
   462        self._target(*self._args, **self._kwargs)
   463      File "/home/pp/pp/pp/process/uploader.py", line 145, in upload_thread
   464        topic.publish(byte_string)
   465      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/cloud/pubsub/topic.py", line 257, in publish
   466        message_ids = api.topic_publish(self.full_name, [message_data])
   467      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/cloud/pubsub/_gax.py", line 165, in topic_publish
   468        options=options)
   469      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/cloud/gapic/pubsub/v1/publisher_api.py", line 289, in publish
   470        return self._publish(request, options)
   471      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 481, in inner
   472        return api_caller(api_call, this_settings, request)
   473      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 158, in inner
   474        return a_func(request, **kwargs)
   475      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 434, in inner
   476        errors.create_error('RPC failed', cause=exception))
   477      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/future/utils/__init__.py", line 419, in raise_with_traceback
   478        raise exc.with_traceback(traceback)
   479      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 430, in inner
   480        return a_func(*args, **kwargs)
   481      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 64, in inner
   482        return a_func(*updated_args, **kwargs)
   483      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/grpc/_channel.py", line 481, in __call__
   484        return _end_unary_response_blocking(state, False, deadline)
   485      File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/grpc/_channel.py", line 432, in _end_unary_response_blocking
   486        raise _Rendezvous(state, None, None, deadline)
   487    google.gax.errors.GaxError: GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1478711654.067744009","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1478711654.067706801","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]})>

3 个答案:

答案 0 :(得分:3)

看起来您正在寻找的相关讨论是问题2683,&#34; Frequent gRPC StatusCode.UNAVAILABLE errors&#34;。

你没有做错任何事情,似乎抓住了异常,重试是目前最合适的解决方法。

答案 1 :(得分:0)

如果主题是全局变量,则会停止产生错误。使主题成为一个类变量,并且只实例化一次 - 只调用一次该行:

topic = pubsub.Client().topic(name)

此外,似乎这只适用于Python 2.7 - 在Python 3.6中重试麻木了一点痛苦。

禁用gRPC可以解决Python 3.6的问题 - 这可以通过设置环境变量来完成:

ENV GOOGLE_CLOUD_DISABLE_GRPC=true

答案 2 :(得分:0)

我设法得到一个&#34;不那么漂亮&#34;解决方法。使用在google.cloud.pubsub_v1.subscriber.policy.thread.Policy.on_exception上复制deadline_exceeded代码的策略。

from google.cloud.pubsub_v1.subscriber.policy.thread import Policy
import grpc

class UnavailableHackPolicy(Policy):
    def on_exception(self, exception):
        """
        There is issue on grpc channel that launch an UNAVAILABLE exception now and then. Until
        that issue is fixed we need to protect our consumer thread from broke.
        https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2683
        """
        unavailable = grpc.StatusCode.UNAVAILABLE
        if getattr(exception, 'code', lambda: None)() in [unavailable]:
            print("¡OrbitalHack! - {}".format(exception))
            return
        return super(UnavailableHackPolicy, self).on_exception(exception)

在接收消息功能上,我有一个像

这样的代码
subscriber = pubsub.SubscriberClient(policy_class=UnavailableHackPolicy)
subscription_path = subscriber.subscription_path(project, subscription_name)
subscriber.subscribe(subscription_path, callback=callback, flow_control=flow_control)

问题是,当资源真正无法实现时,我们将无法察觉。但是,虽然GRPC开发人员团队设法解决了这个问题,但我们将使用此解决方法。