通过Log4j记录HDFS上的Spark驱动程序和执行程序日志

时间:2018-03-12 20:12:52

标签: apache-spark hdfs

如何通过Log4j在HDFS中自定义日志执行程序日志? 我试过,但日志不是在HDFS中创建的。请通过任何方式确认是否可行。以下是我的log4j配置。

(注意 - 但是,我们能够在Spark历史记录服务器UI中查看自定义日志作为执行程序登录的一部分,该UI从YARN中提取执行程序日志,该日志以不可读的格式存储在默认HDFS目录中,但它没有使用我在下面提到的自定义日志记录目录或自定义文件)

下面的LOG4J属性:::

log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c - %m%n

log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=hdfs:///tmp/driverlogs/sparker-driver.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n

log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=hdfs:///tmp/executorlogs/SparkUser.log
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n

log4j.rootLogger=DEBUG,RollingAppender,myConsoleAppender
log4j.logger.myLogger=INFO,RollingAppenderU

log4j.logger.spark.storage=INFO, RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, RollingAppender
log4j.additivty.spark.MapOutputTracker=false

Scala - 下面的Spark程序

package com.wba.logtest.logtesting
import org.apache.log4j.{Level, LogManager}
import org.apache.spark._
import org.apache.spark.rdd.RDD

class Mapper(n: Int) extends Serializable{
  @transient lazy val log = org.apache.log4j.LogManager.getLogger("myLogger")
  def doSomeMappingOnDataSetAndLogIt(rdd: RDD[Int]): RDD[String] =
    rdd.map{ i =>
      log.info("mapping: " + i)
      (i + n).toString
    }
}
object Mapper {
  def apply(n: Int): Mapper = new Mapper(n)
}
object app {
  def main(args: Array[String]) {
    val log = LogManager.getRootLogger
    log.setLevel(Level.INFO)
    val conf = new SparkConf().setAppName("demo-app")
    val sc = new SparkContext(conf)
    log.info("Hello demo")
    val data = sc.parallelize(1 to 1000)
    val mapper = Mapper(1)
    val other = mapper.doSomeMappingOnDataSetAndLogIt(data)
    other.collect()
    log.info("I am done")
  }
}

`

1 个答案:

答案 0 :(得分:1)

(HDFS)YARN日志采用可读格式,您可以从命令行yarn logs -applicationId ..获取它们,并传递您的Spark应用程序ID

关于Spark驱动程序日志,它取决于您用于提交Spark作业的模式。在客户端模式下,日志位于标准输出中。在群集模式下,日志与触发作业的YARN应用程序ID相关联。

否则,另一个好的选择是通过连接到Logstash / Elasticsearch的log4j套接字appender来记录消息。