Django,Google Kubernetes Engine上的Apache2将Opencensus跟踪写入Stackdriver Trace

时间:2019-02-08 17:25:59

标签: django apache2 mod-wsgi google-cloud-stackdriver opencensus

我有一个由Apache2提供的Django Web应用程序,其mod_wsgi位于Docker容器中,该容器在Google Cloud Platform的Kubernetes集群上运行,并受身份识别代理保护。一切工作正常,但是我想为所有请求发送GCP Stackdriver跟踪,而不为项目中的每个视图编写一个。我发现了使用Opencensus来处理此问题的中间件。我经历了this documentation,并且可以通过指定StackdriverExporter并将project_id参数作为Google Cloud Platform {{1}来手动生成在我的项目中导出到Stackdriver Trace的跟踪。 }用于我的项目。

现在要自动执行所有请求,我按照说明设置了中间件。在settings.py中,我将模块添加到Project NumberINSTALLED_APPS,并设置了MIDDLEWARE选项字典。我还添加了OPENCENSUS_TRACE。这对于默认的导出器'opencensus.trace.exporters.print_exporter.PrintExporter'非常有用,因为我可以在Apache2 Web服务器日志中看到跟踪和跨度信息,包括跟踪ID和所有详细信息。但是,我想将它们发送到我的Stackdriver Trace处理器进行分析。

我尝试将OPENCENSUS_TRACE_PARAMS参数设置为EXPORTER,只要您提供项目编号,当从shell手动运行时,该参数就可以工作。

设置为使用opencensus.trace.exporters.stackdriver_exporter.StackdriverExporter时,网页将不响应负载,运行状况检查开始失败,最终网页返回502错误,并指出我应该在30天内重试秒(我相信Identity-Aware代理一旦检测到运行状况检查失败,就会生成此错误),但是服务器不会生成任何错误,并且Apache2的访问日志或错误均不会。

settings.py中还有另一本名为StackdriverExporter的字典,我想确定导出器应该使用哪个项目号。该示例将OPENCENSUS_TRACE_PARAMS设置为GCP_EXPORTER_PROJECT,并将None设置为SERVICE_NAME

我需要设置哪些选项以使导出程序发送回Stackdriver而不是打印到日志?您是否知道我该如何设置?

settings.py

'my_service'

这是一个Apache2日志(设置为使用MIDDLEWARE = ( ... 'opencensus.trace.ext.django.middleware.OpencensusMiddleware', ) INSTALLED_APPS = ( ... 'opencensus.trace.ext.django', ) OPENCENSUS_TRACE = { 'SAMPLER': 'opencensus.trace.samplers.probability.ProbabilitySampler', 'EXPORTER': 'opencensus.trace.exporters.stackdriver_exporter.StackdriverExporter', # This one just makes the server hang with no response or error and kills the health check. 'PROPAGATOR': 'opencensus.trace.propagation.google_cloud_format.GoogleCloudFormatPropagator', # 'EXPORTER': 'opencensus.trace.exporters.print_exporter.PrintExporter', # This one works to print the Trace and Span with IDs and details in the logs. } OPENCENSUS_TRACE_PARAMS = { 'BLACKLIST_PATHS': ['/health'], 'GCP_EXPORTER_PROJECT': 'my_project_number', # Should this be None like the example, or Project ID, or Project Number? 'SAMPLING_RATE': 0.5, 'SERVICE_NAME': 'my_service', # Not sure if this is my app name or some other service name. 'ZIPKIN_EXPORTER_HOST_NAME': 'localhost', # Are the following even necessary, or are they causing a failure that is not detected by Apache2? 'ZIPKIN_EXPORTER_PORT': 9411, 'ZIPKIN_EXPORTER_PROTOCOL': 'http', 'JAEGER_EXPORTER_HOST_NAME': None, 'JAEGER_EXPORTER_PORT': None, 'JAEGER_EXPORTER_AGENT_HOST_NAME': 'localhost', 'JAEGER_EXPORTER_AGENT_PORT': 6831 } 时的示例(我为可读性指定了格式):

PrintExporter

在此先感谢您提供任何提示,帮助或故障排除建议!

编辑UTC 2019-02-08 6:56:

我在中间件中发现了这个

[Fri Feb 08 09:00:32.427575 2019]
[wsgi:error]
[pid 1097:tid 139801302882048]
[client 10.48.0.1:43988]
[SpanData(
  name='services.views.my_view', 
  context=SpanContext(
    trace_id=e882f23e49e34fc09df621867d753532,
    span_id=None,
    trace_options=TraceOptions(enabled=True),
    tracestate=None
  ),
  span_id='bcbe7b96906a482a',
  parent_span_id=None,
  attributes={
    'http.status_code': '200',
    'http.method': 'GET',
    'http.url': '/',
    'django.user.name': ''
  },
  start_time='2019-02-08T17:00:29.845733Z',
  end_time='2019-02-08T17:00:32.427455Z',
  child_span_count=0,
  stack_trace=None,
  time_events=[],
  links=[],
  status=None,
  same_process_as_parent_span=None,
  span_kind=1
)]

导出器现在命名为# Initialize the exporter transport = convert_to_import(settings.params.get(TRANSPORT)) if self._exporter.__name__ == 'GoogleCloudExporter': _project_id = settings.params.get(GCP_EXPORTER_PROJECT, None) self.exporter = self._exporter( project_id=_project_id, transport=transport) elif self._exporter.__name__ == 'ZipkinExporter': _service_name = self._get_service_name(settings.params) _zipkin_host_name = settings.params.get( ZIPKIN_EXPORTER_HOST_NAME, 'localhost') _zipkin_port = settings.params.get( ZIPKIN_EXPORTER_PORT, 9411) _zipkin_protocol = settings.params.get( ZIPKIN_EXPORTER_PROTOCOL, 'http') self.exporter = self._exporter( service_name=_service_name, host_name=_zipkin_host_name, port=_zipkin_port, protocol=_zipkin_protocol, transport=transport) elif self._exporter.__name__ == 'TraceExporter': _service_name = self._get_service_name(settings.params) _endpoint = settings.params.get( OCAGENT_TRACE_EXPORTER_ENDPOINT, None) self.exporter = self._exporter( service_name=_service_name, endpoint=_endpoint, transport=transport) elif self._exporter.__name__ == 'JaegerExporter': _service_name = self._get_service_name(settings.params) self.exporter = self._exporter( service_name=_service_name, transport=transport) else: self.exporter = self._exporter(transport=transport) ,而不是StackdriverExporter。我在应用程序GoogleCloudExporter中建立了一个继承GoogleCloudExporter的类,并更新了settings.py以使用StackdriverExporter,但它似乎没有用,我想知道是否有引用这些旧命名方案的其他代码,可能用于传输。我正在寻找源代码的线索...这至少告诉我,我可以摆脱ZIPKIN和JAEGER参数选项,因为这是由GoogleCloudExporter参数决定的。

编辑UTC时间2019-02-08 11:58:

我报废了Apache2来解决问题,只是将docker映像设置为使用Django内置的网络服务器EXPORTER,它可以正常工作!当我转到该站点时,它会为每个请求将跟踪记录写入Stackdriver Trace,Span名称是正在执行的模块和方法。

以某种方式不允许Apache2发送这些消息,但是当以root用户身份运行时,我可以从shell中发送消息。我向问题中添加了Apache2和mod-wsgi标记,因为我有一种有趣的感觉,这与在Apache2和mod-WSGI中派生子进程有关。是因为apache2的子进程被沙盒化而无法创建子进程,还是这是权限?似乎很奇怪,因为据我所知,它只是调用python模块,没有调用外部系统OS二进制文件。任何其他想法将不胜感激!

1 个答案:

答案 0 :(得分:0)

我在使用带有gevent的gunicorn作为工人阶级时遇到了这个问题。要解决并使云痕迹正常运行,解决方案是像这样修补猴子grpc

from gevent import monkey
monkey.patch_all()

import grpc.experimental.gevent as grpc_gevent
grpc_gevent.init_gevent()

请参见https://github.com/grpc/grpc/issues/4629#issuecomment-376962677