如何在流媒体应用程序中登录foreachRDD?

时间:2016-12-29 21:19:45

标签: apache-spark spark-streaming yarn

我们正在研究一个使用火花流消耗kafka消息的应用程序..运行在hadoop / yarn spark集群上....我有log4j属性在驱动程序和工作者的身份上都有...但我仍然没有看到foreachRDD ..i中的日志消息确实看到“每个rdd的开头”和“每个rdd的结束”

val broadcaseLme=sc.broadcast(lme)
logInfo("start for each rdd: ")
val lines: DStream[MetricTypes.InputStreamType] = myConsumer.createDefaultStream()  
             lines.foreachRDD(rdd => {
            if ((rdd != null) && (rdd.count() > 0) && (!rdd.isEmpty()) ) {
              **logInfo("filteredLines: " + rdd.count())**
              **logInfo("start loop")**
              rdd.foreach{x => 
                 val lme = broadcastLme.value    
                 lme.aParser(x).get
                  }
              logInfo("end loop")
            }   })

         logInfo("end of for each rdd ")

              lines.print(10)

我正在使用此

在群集上运行应用程序
spark-submit --verbose --class DevMain --master yarn-cluster --deploy-mode cluster --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.p‌​roperties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j‌​.properties" --files "hdfs://hdfs-name-node:8020/user/hadoopuser/log4j.properties‌​" hdfs://hdfs-name-node:8020/user/hadoopuser/streaming_2.10-1.‌​0.0-SNAPSHOT.jar hdfs://hdfs-name-node:8020/user/hadoopuser/enriched.properti‌​es

我是新来的火花可能有人请帮助为什么我没有看到foreachrdd里面的日志消息这是log4j.properties

log4j.rootLogger=WARN, rolling

log4j.appender.rolling=org.apache.log4j.RollingFileAppender
log4j.appender.rolling.layout=org.apache.log4j.PatternLayout
log4j.appender.rolling.layout.conversionPattern=[%p] %d %c %M - %m%n
log4j.appender.rolling.maxFileSize=100MB
log4j.appender.rolling.maxBackupIndex=10
log4j.appender.rolling.file=${spark.yarn.app.container.log.dir}/titanium-spark-enriched.log
#log4j.appender.rolling.encoding=URF-8

log4j.logger.org.apache,spark=WARN
log4j.logger.org.eclipse.jetty=WARN

log4j.logger.com.x1.projectname=INFO

#log4j.appender.console=org.apache.log4j.ConsoleAppender
#log4j.appender.console.target=System.err
#log4j.appender.console.layout=org.apache.log4j.PatternLayout
#log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
#log4j.logger.org.spark-project.jetty=WARN
#log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
#log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
#log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO

# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR

#log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.RollingAppender.File=./logs/spark/enriched.log
#log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
#log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
#log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n


#log4j.rootLogger=INFO, RollingAppender, console

1 个答案:

答案 0 :(得分:0)

在您构建流式传输计算后,Spark Streaming应用程序的问题似乎缺少start,即

ssc.start()

引用scaladoc of StreamingContext

  

在创建和转换DStream之后,可以分别使用context.start()context.stop()启动和停止流式计算。

     

context.awaitTermination()允许当前线程等待stop()或异常终止上下文。