将Spark控制台日志重定向到文件中

时间:2017-04-04 11:50:12

标签: apache-spark

根据要求,我想保留一些火花主日志,以便在发生错误时记录日志。我知道webUI上有工作人员登录,但我不确定他们是否会显示与主人相同的错误。

我发现我们必须修改conf / log4j.properties但我的尝试不起作用..

默认配置+添加文件:

# Set everything to be logged to the console
log4j.rootCategory=INFO, console, file
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-
project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR

# SPARK-9183: Settings to avoid annoying messages when looking up 
nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

尝试设置文件

###Custom log file
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.fileName=/var/data/log/MasterLogs/master.log
log4j.appender.file.ImmediateFlush=true
## Set the append to false, overwrite
log4j.appender.file.Append=false
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
##Define the layout for file appender
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

2 个答案:

答案 0 :(得分:3)

您需要为驱动程序和执行程序创建2个log4j.properties文件。并使用spark提交提交您的应用程序时将它们路由到驱动程序和执行程序的java选项中

spark-submit --class MAIN_CLASS --driver-java-options "-Dlog4j.configuration=file:PATH_OF_LOG4J" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:PATH_OF_LOG4J" --master MASTER_IP:PORT JAR_PATH

您还可以查看此博客以获取更多详细信息https://blog.knoldus.com/2016/02/23/logging-spark-application-on-standalone-cluster/

答案 1 :(得分:2)

按照此命令。它会将输出和控制台日志写入文件

hadoop @ osboxes:〜/ spark-2.0.1-bin-hadoop2.7 / bin $ ./spark-submit test.py> tempoutfile.txt 2>& 1