我在集群中工作,我没有权限更改文件log4j.properties以在使用pyspark时停止信息记录(如第一个回答here中所述。)以下解决方案如下所述上面的问题是关于spark-shell(scala)的第一个答案
import org.apache.log4j.Logger
import org.apache.log4j.Level
但是对于python的火花(即pyspark),它不起作用,也不起作用
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
如何在不更改log4j.properties文件的情况下停止在pyspark中详细打印信息?
答案 0 :(得分:18)
I used sc.setLogLevel("ERROR")
because I didn't have write access to our cluster's log4j.properties file. From the docs:
Control our logLevel. This overrides any user-defined log settings. Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
答案 1 :(得分:2)
这对我有帮助:
import logging
s_logger = logging.getLogger('py4j.java_gateway')
s_logger.setLevel(logging.ERROR)
spark_context = SparkContext()
答案 2 :(得分:1)
来自https://stackoverflow.com/a/32208445/3811916:
logger = sc._jvm.org.apache.log4j
logger.LogManager.getLogger("org").setLevel( logger.Level.OFF )
logger.LogManager.getLogger("akka").setLevel( logger.Level.OFF )
为我做了诀窍。这基本上是在PySpark's own tests中完成的:
class QuietTest(object):
def __init__(self, sc):
self.log4j = sc._jvm.org.apache.log4j
def __enter__(self):
self.old_level = self.log4j.LogManager.getRootLogger().getLevel()
self.log4j.LogManager.getRootLogger().setLevel(self.log4j.Level.FATAL)