在哪里可以找到Spark RDD处理函数中的日志? (YARN CLUSTER MODE)

时间:2016-04-28 19:36:33

标签: logging apache-spark yarn spark-streaming

我在Yarn Cluster上运行我的Spark应用程序。无论我做什么,我都无法打印RDD功能中的日志。您可以在下面找到我为RDD处理函数编写的示例代码段。我简化了代码来说明我用来编写函数的语法。当我在本地运行它时,我能够看到日志但不能在群集模式下查看。 System.err.println和记录器似乎都没有工作。但我可以看到我的所有驱动程序日志。我甚至尝试使用Root记录器进行记录,但它在RDD处理函数中根本不起作用。我迫切希望看到日志消息,所以最后我找到了使用logger作为瞬态(https://www.mapr.com/blog/how-log-apache-spark)的指南,但事件没有帮助

class SampleFlatMapFunction implements PairFlatMapFunction <Tuple2<String,String>,String,String>{

    private static final long serialVersionUID = 6565656322667L;
    transient Logger  executorLogger = LogManager.getLogger("sparkExecutor");


    private void readObject(java.io.ObjectInputStream in)
            throws IOException, ClassNotFoundException {
            in.defaultReadObject();
            executorLogger = LogManager.getLogger("sparkExecutor");
    }
    @Override
    public Iterable<Tuple2<String,String>> call(Tuple2<String, String> tuple)        throws Exception {

        executorLogger.info(" log testing from  executorLogger ::");
        System.err.println(" log testing from  executorLogger system error stream ");


            List<Tuple2<String, String>> updates = new ArrayList<>();
            //process Tuple , expand and add it to list.
            return updates;

         }
 };

我的Log4j配置如下:


    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

    log4j.appender.stdout=org.apache.log4j.ConsoleAppender
    log4j.appender.stdout.target=System.out
    log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
    log4j.appender.stdout.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

    log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.RollingAppender.File=/var/log/spark/spark.log
    log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
    log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
    log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n

    log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.RollingAppenderU.File=${spark.yarn.app.container.log.dir}/spark-app.log
    log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
    log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
    log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n


    # By default, everything goes to console and file
    log4j.rootLogger=INFO, RollingAppender, console

    # My custom logging goes to another file
    log4j.logger.sparkExecutor=INFO, stdout, RollingAppenderU

我已经尝试过纱线日志,Spark UI Logs无处可查看RDD处理功能的日志语句。我试过下面的方法,但它不起作用

yarn logs -applicationId 

I checked even below HDFS path also

/tmp/logs/

我通过传递以下参数来运行我的spark-submit命令,即使那时它还没有工作

  --master yarn --deploy-mode cluster   --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties"  --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.properties" 

有人可以指导我记录火花RDD和地图功能吗?我在上述步骤中遗漏了什么?

0 个答案:

没有答案