使用Datadog监视Dataflow管道

时间:2017-11-30 15:45:51

标签: google-cloud-dataflow datadog

我正在寻找一种使用Datadog监控GCP Dataflow管道的解决方案,以提取内置指标以及Beam自定义指标。目前,Datadog为其他GCP服务提供集成,但不为Dataflow提供集成。有没有人做过类似的工作,可以分享如何将其构建为自定义解决方案的指针?

1 个答案:

答案 0 :(得分:0)

目前,我仅看到两种可能性:

  1. 使用来自Google-cloud-clients / google-cloud-monitoring的客户端和与Datadog集成的stackdriver来使用GCP自定义指标
  2. 使用部署在云中的datadog代理,并使用Datadog StatsD客户端(Java,Python,Go)连接到它

  1. 使用GCP自定义指标 https://cloud.google.com/monitoring/custom-metrics/creating-metrics 和datadog与GCP集成 https://www.datadoghq.com/product/integrations/#cat-google-cloud

    final MetricServiceClient client = MetricServiceClient.create();
    ProjectName name = ProjectName.of(projectId);
    
    MetricDescriptor descriptor = MetricDescriptor.newBuilder()
        .setType(metricType)
        .setDescription("This is a simple example of a custom metric.")
        .setMetricKind(MetricDescriptor.MetricKind.GAUGE)
        .setValueType(MetricDescriptor.ValueType.DOUBLE)
        .build();
    
    CreateMetricDescriptorRequest request = CreateMetricDescriptorRequest.newBuilder()
        .setName(name.toString())
        .setMetricDescriptor(descriptor)
        .build();
    
    client.createMetricDescriptor(request);
    
  2. 使用datadog statsd客户端,Java一个- https://github.com/DataDog/java-dogstatsd-client,以便您可以部署 GCP上的datadog代理并通过它进行连接。使用Kubernetes的示例。 https://docs.datadoghq.com/tracing/setup/kubernetes/#deploy-agent-daemonset

    import com.timgroup.statsd.ServiceCheck;
    import com.timgroup.statsd.StatsDClient;
    import com.timgroup.statsd.NonBlockingStatsDClient;
    
    public class Foo {
    
      private static final StatsDClient statsd = new NonBlockingStatsDClient(
        "my.prefix",                          /* prefix to any stats; may be null or empty string */
        "statsd-host",                        /* common case: localhost */
        8125,                                 /* port */
        new String[] {"tag:value"}            /* Datadog extension: Constant tags, always applied */
      );
    
      public static final void main(String[] args) {
        statsd.incrementCounter("foo");
        statsd.recordGaugeValue("bar", 100);
        statsd.recordGaugeValue("baz", 0.01); /* DataDog extension: support for floating-point gauges */
        statsd.recordHistogramValue("qux", 15);     /* DataDog extension: histograms */
        statsd.recordHistogramValue("qux", 15.5);   /* ...also floating-point */
        statsd.recordDistributionValue("qux", 15);     /* DataDog extension: global distributions */
        statsd.recordDistributionValue("qux", 15.5);   /* ...also floating-point */
    
        ServiceCheck sc = ServiceCheck
              .builder()
              .withName("my.check.name")
              .withStatus(ServiceCheck.Status.OK)
              .build();
        statsd.serviceCheck(sc); /* Datadog extension: send service check status */
    
        /* Compatibility note: Unlike upstream statsd, DataDog expects execution times to be a
         * floating-point value in seconds, not a millisecond value. This library
         * does the conversion from ms to fractional seconds.
         */
        statsd.recordExecutionTime("bag", 25, "cluster:foo"); /* DataDog extension: cluster tag */
      }
    }
    

    用于kubernetes的datadog deployment.yaml

    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: datadog-agent
    spec:
      template:
        metadata:
          labels:
            app: datadog-agent
          name: datadog-agent
        spec:
          serviceAccountName: datadog-agent
          containers:
          - image: datadog/agent:latest
            imagePullPolicy: Always
            name: datadog-agent
            ports:
              - containerPort: 8125
                # hostPort: 8125
                name: dogstatsdport
                protocol: UDP
              - containerPort: 8126
                # hostPort: 8126
                name: traceport
                protocol: TCP
            env:
              - name: DD_APM_ENABLED
                value: "true"
              - name: DD_API_KEY
                value: "<YOUR_API_KEY>"
              - name: DD_COLLECT_KUBERNETES_EVENTS
                value: "true"
              - name: DD_LEADER_ELECTION
                value: "true"
              - name: KUBERNETES
                value: "yes"
              - name: DD_KUBERNETES_KUBELET_HOST
                valueFrom:
                  fieldRef:
                    fieldPath: status.hostIP
            resources:
              requests:
                memory: "256Mi"
                cpu: "200m"
              limits:
                memory: "256Mi"
                cpu: "200m"
            volumeMounts:
              - name: dockersocket
                mountPath: /var/run/docker.sock
              - name: procdir
                mountPath: /host/proc
                readOnly: true
              - name: cgroups
                mountPath: /host/sys/fs/cgroup
                readOnly: true
            livenessProbe:
              exec:
                command:
                - ./probe.sh
              initialDelaySeconds: 15
              periodSeconds: 5
          volumes:
            - hostPath:
                path: /var/run/docker.sock
              name: dockersocket
            - hostPath:
                path: /proc
              name: procdir
            - hostPath:
                path: /sys/fs/cgroup
              name: cgroups
    

目前,我正在对此进行调查,因此我不确定如何执行此操作。