我在Yarn Cluster上运行我的Spark应用程序。无论我做什么,我都无法打印RDD功能中的日志。您可以在下面找到我为RDD处理函数编写的示例代码段。我简化了代码来说明我用来编写函数的语法。当我在本地运行它时,我能够看到日志但不能在群集模式下查看。 System.err.println和记录器似乎都没有工作。但我可以看到我的所有驱动程序日志。我甚至尝试使用Root记录器进行记录,但它在RDD处理函数中根本不起作用。我迫切希望看到日志消息,所以最后我找到了使用logger作为瞬态(https://www.mapr.com/blog/how-log-apache-spark)的指南,但事件没有帮助
class SampleFlatMapFunction implements PairFlatMapFunction <Tuple2<String,String>,String,String>{
private static final long serialVersionUID = 6565656322667L;
transient Logger executorLogger = LogManager.getLogger("sparkExecutor");
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
in.defaultReadObject();
executorLogger = LogManager.getLogger("sparkExecutor");
}
@Override
public Iterable<Tuple2<String,String>> call(Tuple2<String, String> tuple) throws Exception {
executorLogger.info(" log testing from executorLogger ::");
System.err.println(" log testing from executorLogger system error stream ");
List<Tuple2<String, String>> updates = new ArrayList<>();
//process Tuple , expand and add it to list.
return updates;
}
};
我的Log4j配置如下:
log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err log4j.appender.console.layout=org.apache.log4j.PatternLayout log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender log4j.appender.RollingAppender.File=/var/log/spark/spark.log log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n # By default, everything goes to console and file log4j.rootLogger=INFO, RollingAppender, console # My custom logging goes to another file log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU
我已经尝试过纱线日志,Spark UI Logs无处可查看RDD处理功能的日志语句。我试过下面的方法,但它不起作用
yarn logs -applicationId I checked even below HDFS path also /tmp/logs/
我通过传递以下参数来运行我的spark-submit命令,即使那时它还没有工作
--master yarn --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties"
有人可以指导我记录火花RDD和地图功能吗?我在上述步骤中遗漏了什么?