我在EMR上运行了一个火花作业,但是我的日志消息没有被写入日志。我希望我的日志消息与我运行hadoop作业时发生的火花日志消息混合在一起。当我在本地运行我的作业时,我的日志消息将按预期打印在其余的日志输出中。
我已经尝试了以下哪些不能工作:
import org.slf4j.LoggerFactory
...
val logger = LoggerFactory.getLogger(MyPoc.getClass())
logger.info("message here")
和
import org.apache.log4j.Logger
...
val logger = Logger.getRootLogger()
logger.info("message here")
和
import org.apache.spark.Logging
object MyPoc extends App with Logging {
...
logInfo("message here")
...
}
如何将日志消息写入EMR上运行的spark作业的日志文件中?
我按照以下方式开展工作:
aws emr create-cluster --name EMR-Spark-PoC --ami-version 3.3.1 \
--instance-type=m1.medium --instance-count 2 \
--ec2-attributes KeyName=key-dev,InstanceProfile=EMRJobflowDefault \
--log-uri s3://my-logs/emr/ \
--bootstrap-action Name=Spark,Path=s3://support.elasticmapreduce/spark/install-spark,Args=[-x] \
--steps Name=SparkPoC,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[/home/hadoop/spark/bin/spark-submit,--deploy-mode,cluster,--master,yarn-cluster,--class,my.poc.EmrPoc,s3://my-dev/poc-0.0.1.jar,s3n://my-data/avro/part-m-00000.avro,s3n://my-data/avro/part-m-00000.avro] \
--no-auto-terminate
我使用组装制作了一个胖罐。这是我的build.sbt的大部分内容:
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-client" % "2.2.0" % "provided",
"org.apache.spark" %% "spark-core" % "1.2.0" % "provided",
"org.apache.spark" %% "spark-sql" % "1.2.0",
"com.databricks" %% "spark-avro" % "0.1"
)
assemblyMergeStrategy in assembly := {
case x if x.endsWith(".class") => MergeStrategy.last
case x if x.endsWith(".properties") => MergeStrategy.last
case x if x.contains("/resources/") => MergeStrategy.last
case x if x.startsWith("META-INF/mailcap") => MergeStrategy.last
case x if x.startsWith("META-INF/mimetypes.default") => MergeStrategy.first
case x if x.startsWith("META-INF/maven/org.slf4j/slf4j-api/pom.") => MergeStrategy.first
case x =>
val oldStrategy = (assemblyMergeStrategy in assembly).value
if (oldStrategy == MergeStrategy.deduplicate)
MergeStrategy.first
else
oldStrategy(x)
}
assemblyExcludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter {_.data.getName == "avro-ipc-1.7.7-tests.jar"}
}
答案 0 :(得分:0)
您可以在EMR上创建一个引导操作,将您的应用程序log4j设置附加到EMR log4j环境。
1)示例引导动作
#!/bin/bash
set -x
CUSTOM_LOG4J_FILE=$1
CUSTOM_LOG4J_FILE_NAME="customlog4j.txt"
echo "Starting to copy over logging configuration on EMR"
hadoop fs -get $CUSTOM_LOG4J_FILE /home/hadoop/
cat /home/hadoop/$CUSTOM_LOG4J_FILE_NAME >> /home/hadoop/conf/log4j.properties
exit 0
2)customlog4j.txt
的示例内容log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=ERROR
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=ERROR
log4j.logger.org.apache.spark=ERROR
log4j.logger.akka=ERROR
log4j.logger.io=ERROR
log4j.my.poc=DEBUG
注意:如果您只需要更改Spark驱动程序的log4j选项,install-spark
引导操作就会有-l
选项