我正在运行spark 2.0.2
并在火花独立群集上以cluster
部署模式部署流式传输作业。流工作正常,但是在SPARK_HOME
的工作目录中创建的应用程序和驱动程序的 stderr 文件存在问题。由于流式传输始终在运行,因此这些文件的大小只会增加,我不知道如何控制它。
我尝试了以下解决方案,即使它们与手头的问题并不完全相关,但我仍然尝试过并且不起作用:
任何人都可以帮助我如何限制正在创建的这些文件的大小?
P.S:我已尝试在conf/spark-env.sh
中添加以下行并重新启动群集的解决方案,但在运行流应用程序时无效。
export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true -Dspark.worker.cleanup.interval=60 -Dspark.worker.cleanup.appDataTtl=60"
修改:
@YuvalItzchakov我已经尝试了你的建议,但它没有用。驱动程序的stderr
日志如下:
Launch Command: "/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java" "-cp" "/mnt/spark2.0.2/conf/:/mnt/spark2.0.2/jars/*" "-Xmx2048M" "-Dspark.eventLog.enabled=true" "-Dspark.eventLog.dir=/mnt/spark2.0.2/JobsLogs" "-Dspark.executor.memory=2g" "-Dspark.deploy.defaultCores=2" "-Dspark.io.compression.codec=snappy" "-Dspark.submit.deployMode=cluster" "-Dspark.shuffle.consolidateFiles=true" "-Dspark.shuffle.compress=true" "-Dspark.app.name=Streamingjob" "-Dspark.kryoserializer.buffer.max=128M" "-Dspark.master=spark://172.16.0.27:7077" "-Dspark.shuffle.spill.compress=true" "-Dspark.serializer=org.apache.spark.serializer.KryoSerializer" "-Dspark.cassandra.input.fetch.size_in_rows=20000" "-Dspark.executor.extraJavaOptions=-Dlog4j.configuration=file:///mnt/spark2.0.2/sparkjars/log4j.xml" "-Dspark.jars=file:/mnt/spark2.0.2/sparkjars/StreamingJob-assembly-0.1.0.jar" "-Dspark.executor.instances=10" "-Dspark.driver.extraJavaOptions=-Dlog4j.configuration=file:///mnt/spark2.0.2/sparkjars/log4j.xml" "-Dspark.driver.memory=2g" "-Dspark.rpc.askTimeout=10" "-Dspark.eventLog.compress=true" "-Dspark.executor.cores=1" "-Dspark.driver.supervise=true" "-Dspark.history.fs.logDirectory=/mnt/spark2.0.2/JobsLogs" "-Dlog4j.configuration=file:///mnt/spark2.0.2/sparkjars/log4j.xml" "org.apache.spark.deploy.worker.DriverWrapper" "spark://Worker@172.16.0.29:34475" "/mnt/spark2.0.2/work/driver-20170210124424-0001/StreamingJob-assembly-0.1.0.jar" "Streamingjob"
========================================
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/02/10 12:44:26 INFO SecurityManager: Changing view acls to: cassuser
17/02/10 12:44:26 INFO SecurityManager: Changing modify acls to: cassuser
17/02/10 12:44:26 INFO SecurityManager: Changing view acls groups to:
17/02/10 12:44:26 INFO SecurityManager: Changing modify acls groups to:
我的log4j.xml
文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd" >
<log4j:configuration>
<appender name="stdout" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="TRACE"/>
<param name="File" value="stdout"/>
<param name="maxFileSize" value="1MB"/>
<param name="maxBackupIndex" value="10"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
<filter class="org.apache.log4j.varia.LevelRangeFilter">
<param name="levelMin" value="ALL" />
<param name="levelMax" value="OFF" />
</filter>
</appender>
<appender name="stderr" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="WARN"/>
<param name="File" value="stderr"/>
<param name="maxFileSize" value="1MB"/>
<param name="maxBackupIndex" value="10"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
</appender>
</log4j:configuration>
请注意,我已在答案中从您的xml中删除了此根标记,因为它会出现一些错误:
<root>
<appender-ref ref="console"/>
</root>
答案 0 :(得分:0)
您可以使用自定义log4j
xml文件。
首先,声明您的XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd" >
<log4j:configuration>
<appender name="stdout" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="TRACE"/>
<param name="File" value="stdout"/>
<param name="maxFileSize" value="50MB"/>
<param name="maxBackupIndex" value="100"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
<filter class="org.apache.log4j.varia.LevelRangeFilter">
<param name="levelMin" value="ALL" />
<param name="levelMax" value="OFF" />
</filter>
</appender>
<appender name="stderr" class="org.apache.log4j.RollingFileAppender">
<param name="threshold" value="WARN"/>
<param name="File" value="stderr"/>
<param name="maxFileSize" value="50MB"/>
<param name="maxBackupIndex" value="100"/>
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n"/>
</layout>
</appender>
<root>
<appender-ref ref="console"/>
</root>
</log4j:configuration>
然后,当您运行流媒体作业时,您需要通过log4j.xml
将extraJavaOptions
文件传递给Spark主人和工作人员:
spark-submit \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///path/to/log4j.xml \
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///path/to/log4j.xml
请注意,主节点和辅助节点上的路径可能不同,具体取决于您将JAR和文件部署到Spark的方式。您说您正在使用群集模式,因此我假设您手动调度JAR和额外文件,但对于在客户端模式下运行此任务的任何人,您还需要通过--files
标志添加xml文件。