在Spark Streaming中禁用适用于AWS Kinesis的CloudWatch

时间:2017-01-23 16:28:15

标签: apache-spark spark-streaming amazon-kinesis

我想知道是否可能?

这里是代码:numStreams我通过使用AmazonKinesisClient API

获得它
 // Create the Kinesis DStreams
    List<JavaDStream<byte[]>> streamsList = new ArrayList<>(numStreams);
    for (int i = 0; i < numStreams; i++) {
      streamsList.add(
              KinesisUtils.createStream(jssc, kinesisAppName, streamName, endpointUrl, regionName,
              InitialPositionInStream.TRIM_HORIZON, kinesisCheckpointInterval,
              StorageLevel.MEMORY_AND_DISK_2(),accessesKey,secretKey)
      );
    }

我尝试查看API,但我找不到任何禁用Apache Streaming CloudWatch的参考。

这是我试图摆脱的警告:

  

17/01/23 17:46:29 WARN CWPublisherRunnable:无法向CloudWatch发布16个基准   com.amazonaws.AmazonServiceException:User:arn:aws:iam ::: user / Kinesis_Service无权执行:cloudwatch:PutMetricData(服务:AmazonCloudWatch;状态代码:403;错误代码:AccessDenied;请求ID:**** *)       在com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1377)       在com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:923)       在com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:701)       在com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:453)       在com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:415)       在com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:364)       在com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:984)       在com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:954)       在com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.putMetricData(AmazonCloudWatchClient.java:853)       在com.amazonaws.services.kinesis.metrics.impl.DefaultCWMetricsPublisher.publishMetrics(DefaultCWMetricsPublisher.java:63)       在com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:144)       在com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.run(CWPublisherRunnable.java:90)       在java.lang.Thread.run(未知来源)

1 个答案:

答案 0 :(得分:1)

前言:我知道这是一个古老的问题,但是正好面对这个问题,因此为遇到Spark <= 2.3.3

问题的任何人发布解决方案

构建客户端时,可以使用withMetrics方法在KCL(Kinesis客户端)库级别禁用Cloudwatch指标报告。

不幸的是,Spark KinesisInputDStream方法没有公开更改此设置并使情况更糟的方法,默认级别为“详细”,每10秒发送10秒钟的指标。

我禁用它的方法是为KinesisInputDStream的方法cloudWatchCredentials提供无效的凭证。 IE:.cloudWatchCredentials(SparkAWSCredentials.builder.basicCredentials("DISABLED", "DISABLED").build())

然后在每个刻度上出现CloudWatchAsyncClient日志记录警告的问题,我通过在spark log4j.properties配置中设置以下内容来禁用它:

# Set Kinesis logging metrics to Warn - Since we intentionally provide
# wrong credentials in order to disable cloudwatch logging. Bad credential
# warning are logged at WARN level - so we still get errors.
log4j.logger.com.amazonaws.services.kinesis.metrics=ERROR

这将仅抑制针对指标包类的警告(例如您提到的那种),但不会抑制错误(如果需要的话)。

这还远没有一个理想的解决方案,但这使我们能够在部署现有Spark版本的同时部署解决方案。

下一步:打开Spark的票证,以便他们希望我们可以在下一版本中将其禁用。

编辑-已创建:https://issues.apache.org/jira/browse/SPARK-27420用于跟踪