在我们的某个分布式系统上发生事件后,我们正尝试对现有主题进行简单发布。
代码如下:
try:
dat = data.encode('utf-8')
topic.publish(dat)
except:
<code to recover>
如果我们捕获所有的除外并打印回溯,我们得到:
google.gax.errors.GaxError:GaxError(RPC失败,由 &lt; _以(StatusCode.UNAVAILABLE结尾)终止的RPC的Rendezvous, {&#34;已创建&#34;:&#34; @ 1478711654.067744009&#34;,&#34;说明&#34;:&#34;安全阅读 失败&#34;&#34;文件&#34;:&#34; SRC /核心/ LIB /安全/运输/ secure_endpoint.c&#34;&#34; file_line&#34; 157&#34; grpc_status&#34;:14,&#34; referenced_errors&#34;:[{&#34;创建&#34;:&#34; @ 1478711654.067706801&#34;&#34;描述&#34;:&#34 ; EOF&#34;&#34;文件&#34;:&#34; SRC /核心/ LIB / iomgr / tcp_posix.c&#34;&#34; file_line&#34; 235}]})&GT;
(以下完整错误)
看看http://gcloud-python.readthedocs.io/en/latest/pubsub-topic.html#google.cloud.pubsub.topic.Topic.publish,看起来这个GAX错误并不是我们应该关注的问题。但是,如果我们做捕获错误并使用指数退避重试,这通常会第二次运行。
我找到了this discussion,虽然它在_gax_python
中说明了潜在的错误,但它似乎并不相关。关于我们在这里做错了什么的想法?
完整错误:
458 Traceback (most recent call last):
459 File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
460 self.run()
461 File "/usr/lib/python3.5/threading.py", line 862, in run
462 self._target(*self._args, **self._kwargs)
463 File "/home/pp/pp/pp/process/uploader.py", line 145, in upload_thread
464 topic.publish(byte_string)
465 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/cloud/pubsub/topic.py", line 257, in publish
466 message_ids = api.topic_publish(self.full_name, [message_data])
467 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/cloud/pubsub/_gax.py", line 165, in topic_publish
468 options=options)
469 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/cloud/gapic/pubsub/v1/publisher_api.py", line 289, in publish
470 return self._publish(request, options)
471 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 481, in inner
472 return api_caller(api_call, this_settings, request)
473 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 158, in inner
474 return a_func(request, **kwargs)
475 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 434, in inner
476 errors.create_error('RPC failed', cause=exception))
477 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/future/utils/__init__.py", line 419, in raise_with_traceback
478 raise exc.with_traceback(traceback)
479 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 430, in inner
480 return a_func(*args, **kwargs)
481 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/google/gax/api_callable.py", line 64, in inner
482 return a_func(*updated_args, **kwargs)
483 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/grpc/_channel.py", line 481, in __call__
484 return _end_unary_response_blocking(state, False, deadline)
485 File "/home/pp/.virtualenvs/cv/lib/python3.5/site-packages/grpc/_channel.py", line 432, in _end_unary_response_blocking
486 raise _Rendezvous(state, None, None, deadline)
487 google.gax.errors.GaxError: GaxError(RPC failed, caused by <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, {"created":"@1478711654.067744009","description":"Secure read failed","file":"src/core/lib/security/transport/secure_endpoint.c","file_line":157,"grpc_status":14,"referenced_errors":[{"created":"@1478711654.067706801","description":"EOF","file":"src/core/lib/iomgr/tcp_posix.c","file_line":235}]})>
答案 0 :(得分:3)
看起来您正在寻找的相关讨论是问题2683,&#34; Frequent gRPC StatusCode.UNAVAILABLE errors&#34;。
你没有做错任何事情,似乎抓住了异常,重试是目前最合适的解决方法。
答案 1 :(得分:0)
如果主题是全局变量,则会停止产生错误。使主题成为一个类变量,并且只实例化一次 - 只调用一次该行:
topic = pubsub.Client().topic(name)
此外,似乎这只适用于Python 2.7 - 在Python 3.6中重试麻木了一点痛苦。
禁用gRPC可以解决Python 3.6的问题 - 这可以通过设置环境变量来完成:
ENV GOOGLE_CLOUD_DISABLE_GRPC=true
答案 2 :(得分:0)
我设法得到一个&#34;不那么漂亮&#34;解决方法。使用在google.cloud.pubsub_v1.subscriber.policy.thread.Policy.on_exception上复制deadline_exceeded代码的策略。
from google.cloud.pubsub_v1.subscriber.policy.thread import Policy
import grpc
class UnavailableHackPolicy(Policy):
def on_exception(self, exception):
"""
There is issue on grpc channel that launch an UNAVAILABLE exception now and then. Until
that issue is fixed we need to protect our consumer thread from broke.
https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2683
"""
unavailable = grpc.StatusCode.UNAVAILABLE
if getattr(exception, 'code', lambda: None)() in [unavailable]:
print("¡OrbitalHack! - {}".format(exception))
return
return super(UnavailableHackPolicy, self).on_exception(exception)
在接收消息功能上,我有一个像
这样的代码subscriber = pubsub.SubscriberClient(policy_class=UnavailableHackPolicy)
subscription_path = subscriber.subscription_path(project, subscription_name)
subscriber.subscribe(subscription_path, callback=callback, flow_control=flow_control)
问题是,当资源真正无法实现时,我们将无法察觉。但是,虽然GRPC开发人员团队设法解决了这个问题,但我们将使用此解决方法。