如何通过Log4j在HDFS中自定义日志执行程序日志? 我试过,但日志不是在HDFS中创建的。请通过任何方式确认是否可行。以下是我的log4j配置。
(注意 - 但是,我们能够在Spark历史记录服务器UI中查看自定义日志作为执行程序登录的一部分,该UI从YARN中提取执行程序日志,该日志以不可读的格式存储在默认HDFS目录中,但它没有使用我在下面提到的自定义日志记录目录或自定义文件)
下面的LOG4J属性:::
log4j.appender.myConsoleAppender=org.apache.log4j.ConsoleAppender
log4j.appender.myConsoleAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.myConsoleAppender.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppender.File=hdfs:///tmp/driverlogs/sparker-driver.log
log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=hdfs:///tmp/executorlogs/SparkUser.log
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.rootLogger=DEBUG,RollingAppender,myConsoleAppender
log4j.logger.myLogger=INFO,RollingAppenderU
log4j.logger.spark.storage=INFO, RollingAppender
log4j.additivity.spark.storage=false
log4j.logger.spark.scheduler=INFO, RollingAppender
log4j.additivity.spark.scheduler=false
log4j.logger.spark.CacheTracker=INFO, RollingAppender
log4j.additivity.spark.CacheTracker=false
log4j.logger.spark.CacheTrackerActor=INFO, RollingAppender
log4j.additivity.spark.CacheTrackerActor=false
log4j.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
log4j.additivity.spark.MapOutputTrackerActor=false
log4j.logger.spark.MapOutputTracker=INFO, RollingAppender
log4j.additivty.spark.MapOutputTracker=false
Scala - 下面的Spark程序
package com.wba.logtest.logtesting
import org.apache.log4j.{Level, LogManager}
import org.apache.spark._
import org.apache.spark.rdd.RDD
class Mapper(n: Int) extends Serializable{
@transient lazy val log = org.apache.log4j.LogManager.getLogger("myLogger")
def doSomeMappingOnDataSetAndLogIt(rdd: RDD[Int]): RDD[String] =
rdd.map{ i =>
log.info("mapping: " + i)
(i + n).toString
}
}
object Mapper {
def apply(n: Int): Mapper = new Mapper(n)
}
object app {
def main(args: Array[String]) {
val log = LogManager.getRootLogger
log.setLevel(Level.INFO)
val conf = new SparkConf().setAppName("demo-app")
val sc = new SparkContext(conf)
log.info("Hello demo")
val data = sc.parallelize(1 to 1000)
val mapper = Mapper(1)
val other = mapper.doSomeMappingOnDataSetAndLogIt(data)
other.collect()
log.info("I am done")
}
}
`
答案 0 :(得分:1)
(HDFS)YARN日志采用可读格式,您可以从命令行yarn logs -applicationId ..
获取它们,并传递您的Spark应用程序ID
关于Spark驱动程序日志,它取决于您用于提交Spark作业的模式。在客户端模式下,日志位于标准输出中。在群集模式下,日志与触发作业的YARN应用程序ID相关联。
否则,另一个好的选择是通过连接到Logstash / Elasticsearch的log4j套接字appender来记录消息。