我通过连接到一个拥有一个主服务器和两个从服务器的spark独立集群来运行spark-1.0.0。我通过Spark-submit运行wordcount.py,实际上它从HDFS读取数据并将结果写入HDFS。到目前为止一切都很好,结果将正确写入HDFS。但令我担心的是,当我为每个工人检查Stdout时,它是空的我不知道它是否是空的?我在stderr中得到了关注:
某些(app-20140704174955-0002)的stderr日志页面Spark
Executor Command: "java" "-cp" "::
/usr/local/spark-1.0.0/conf:
/usr/local/spark-1.0.0
/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar:/usr/local/hadoop/conf" "
-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend
" "akka.tcp://spark@master:54477/user/CoarseGrainedScheduler" "0" "slave2" "1
" "akka.tcp://sparkWorker@slave2:41483/user/Worker" "app-20140704174955-0002"
========================================
14/07/04 17:50:14 ERROR CoarseGrainedExecutorBackend:
Driver Disassociated [akka.tcp://sparkExecutor@slave2:33758] ->
[akka.tcp://spark@master:54477] disassociated! Shutting down.
答案 0 :(得分:8)
Spark总是会写一切,甚至是INFO给stderr。人们似乎这样做是为了阻止stdout缓冲消息并导致更少的可预测日志记录。当已知应用程序永远不会用于bash脚本时,这是一种可接受的做法,因此特别适用于日志记录。
答案 1 :(得分:6)
在传递给Spark的 log4j.properties 中尝试此操作(或修改Spark / conf下的默认配置)
# Log to stdout and stderr
log4j.rootLogger=INFO, stdout, stderr
# Send TRACE - INFO level to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=TRACE
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.filter.filter1=org.apache.log4j.varia.LevelRangeFilter
log4j.appender.stdout.filter.filter1.levelMin=TRACE
log4j.appender.stdout.filter.filter1.levelMax=INFO
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
# Send WARN or higher to stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.Threshold=WARN
log4j.appender.stderr.Target =System.err
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
# Change this to set Spark log level
log4j.logger.org.apache.spark=WARN
log4j.logger.org.apache.spark.util=ERROR
此外,INFO级别显示的进度条会发送到stderr。
使用
禁用spark.ui.showConsoleProgress=false