当我使用spark2.4的sparkstreaming来消耗kafka时,我发现打印了我的foreachRDD方法之外的日志,但是没有打印出foreachRDD内部的日志。我正在使用的日志api是log4j,版本为1.2。
我尝试添加
spark.executor.extraJavaOptions = -Dlog4j.configuration = log4j.properties
spark.driver.extraJavaOptions = -Dlog4j.configuration = log4j.properties
到spark-defaults.properties配置文件,并且在开始时,我在打印日志级别和日志配置文件路径错误信息时写了错误的路径
因此spark.executor.extraJavaOptions和spark.driver.extraJavaOptions配置生效。
答案 0 :(得分:1)
foreach
内外日志块是在不同的机器上执行的,一个在驱动程序上,另一个在执行程序上。因此,如果您想在foreach
块中查看日志,可以访问yarn以获得更多日志。
答案 1 :(得分:0)
<code>
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/vdir/mnt/disk2/hadoop/yarn/local/usercache/root/filecache/494/__spark_libs__3795396964941241866.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/01/10 14:17:16 ERROR KafkaSparkStreamingKafkaTests: receive+++++++++++++++++++++++++++++++
</code>
My code:
<code>
1.if (args[3].equals("consumer1")) {
logger.error("receive+++++++++++++++++++++++++++++++");
SparkSQLService sparkSQLService = new SparkSQLService();
consumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, "consumer1");
sparkSQLService.sparkForwardedToKafka(sparkConf,
CONSUMER_TOPIC,
PRODUCER_TOPIC,
new HashMap<String, Object>((Map) consumerProperties));
......
2.public void sparkForwardedToKafka(SparkConf sparkConf, String consumerTopic, String producerTopic, Map<String, Object> kafkaConsumerParamsMap) {
sparkConf.registerKryoClasses(new Class[]{SparkSQLService.class, FlatMapFunction.class, JavaPairInputDStream.class, Logger.class});
JavaStreamingContext javaStreamingContext = new JavaStreamingContext(sparkConf, Durations.milliseconds(DURATION_SECONDS));
Collection<String> topics = Arrays.asList(consumerTopic);
JavaInputDStream<ConsumerRecord<String, String>> streams =
KafkaUtils.createDirectStream(
javaStreamingContext,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.Subscribe(topics, kafkaConsumerParamsMap)
);
if (producerTopic != null) {
JavaPairDStream<Long, String> messages = streams.mapToPair(record -> new Tuple2<>(record.timestamp(), record.value()));
messages.foreachRDD(rdd ->
{
rdd.foreachPartition(partition -> {
partition.forEachRemaining(tuple2 -> {
LOGGER.error("****"+tuple2._1+"|"+tuple2._2);
KafkaService.getInstance().send(producerTopic, TaskContext.get().partitionId(), tuple2._1, null, tuple2._2);
});
});
}
);
</code>
我的记录器声明: 私有静态最终Logger LOGGER = LoggerFactory.getLogger(SparkSQLService.class);