我想在Spark中创建一个自定义记录器。我想从本地文件中的执行程序发送一些消息以进行调试。我试着遵循这个tutorial,所以我编辑了这样的log4j.properties文件来创建一个cisutom looger,将日志保存在/mypath/sparkU.log中:
# My added lines
log4j.logger.myLogger=WARN, RollingAppenderU
log4j.appender.RollingAppenderU=org.apache.log4j.DailyRollingFileAppender
log4j.appender.RollingAppenderU.File=/mypath/sparkU.log
log4j.appender.RollingAppenderU.DatePattern='.'yyyy-MM-dd
log4j.appender.RollingAppenderU.layout=org.apache.log4j.PatternLayout
log4j.appender.RollingAppenderU.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.rootLogger=${root.logger}
root.logger=WARN,console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
shell.log.level=WARN
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
log4j.logger.org.apache.spark.repl.Main=${shell.log.level}
log4j.logger.org.apache.spark.api.python.PythonGatewayServer=${shell.log.level}
然后我用spark运行提交这个(我通常在Python工作,但语言不是问题):
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, SparkSession
from pyspark.sql.types import *
spark = SparkSession \
.builder \
.master("yarn") \
.appName("test custom logging") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
log4jLogger = spark.sparkContext._jvm.org.apache.log4j
log = log4jLogger.LogManager.getLogger(__name__)
log.error("Hello demo")
log.error("I am done")
print 'hello from print'
但是创建文件SparkU.log时它是空的。正确创建控制台和hdfs中的Spark日志。为什么日志文件为空,哪个是正确的方法来做这样的事情?我在Yarn下使用Spark 2.1,我使用Cloudera发行版。谢谢你的建议。