无法找到PySpark stdout日志

时间:2017-09-19 10:34:41

标签: apache-spark logging pyspark

我正在研究PySpark应用程序,并将其部署为纱线群集模式。我已经将stdout作为日志流处理程序。我可以在YARN UI中看到日志。但是,我在/ var / log / sparkapp / yarn下找不到stdout个日志。我只看到那里的stderr日志。这可能是什么原因?

这是我在应用程序中的日志记录部分

import logging
import sys

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
lsh = logging.StreamHandler(sys.stdout)
lsh.setLevel(logging.INFO)
lformat = logging.Formatter(fmt='%(asctime)s.%(msecs)03d %(levelname)s :%(name)s - %(message)s', datefmt='%m/%d/%Y %I:%M:%S')
lsh.setFormatter(lformat)
logger.addHandler(lsh)

log4j.properties

log4jspark.root.logger=INFO,console
log4jspark.log.dir=.
log4jspark.log.file=spark.log
log4jspark.log.maxfilesize=1024MB
log4jspark.log.maxbackupindex=10

# Define the root logger to the system property "spark.root.logger".
log4j.rootLogger=${log4jspark.root.logger}, EventCounter

# Set everything to be logged to the console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.appender.console.Threshold=INFO

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

1 个答案:

答案 0 :(得分:0)

尝试使用此代码来获取spark作业的记录器:

log4jLogger = sc._jvm.org.apache.log4j
logger = log4jLogger.LogManager.getLogger(__name__)

您可以修改log4j.properties以更改target文件:

log4j.appender.console.target=System.out