在运行时从spark-submit获取CSV接收器度量文件

时间:2016-08-18 13:51:57

标签: apache-spark

在/ conf中启用metrics.properties(启用CSV接收器),如下所示(请参阅下面的配置),每次提交作业时都会收集指标(使用spark-submit),并将其保存到/ tmp /

# Enable CsvSink for all instances
*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink

# Polling period for CsvSink
*.sink.csv.period=1

*.sink.csv.unit=minutes

# Polling directory for CsvSink
*.sink.csv.directory=/tmp/

# Worker instance overlap polling period
worker.sink.csv.period=1

worker.sink.csv.unit=minutes

现在我想在运行时提供metrics.properties文件(使用与上面相同的配置),我给出了spark-submit的参数,如下所示:

$spark_home/bin/spark-submit --files=file:///home/log_properties/metrics.properties --conf spark.metrics.conf=./metrics.properties --class com.myClass job1.jar

我收到以下警告,我的metrics.properties文件中没有任何Graphite配置(我只使用了metrics.template并仅启用了上述csv配置) < / p>

 WARN graphite.GraphiteReporter: Unable to report to Graphite
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:589)
        at java.net.Socket.connect(Socket.java:538)
        at java.net.Socket.<init>(Socket.java:434)
        at java.net.Socket.<init>(Socket.java:244)
        at javax.net.DefaultSocketFactory.createSocket(SocketFactory.java:277)
        at com.codahale.metrics.graphite.Graphite.connect(Graphite.java:118)
        at com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:167)
        at com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162)
        at org.apache.spark.metrics.sink.GraphiteSink.report(GraphiteSink.scala:91)
        at org.apache.spark.metrics.MetricsSystem$$anonfun$report$1.apply(MetricsSystem.scala:114)
        at org.apache.spark.metrics.MetricsSystem$$anonfun$report$1.apply(MetricsSystem.scala:114)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.metrics.MetricsSystem.report(MetricsSystem.scala:114)
        at org.apache.spark.SparkContext$$anonfun$stop$3.apply$mcV$sp(SparkContext.scala:1715)
        at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1219)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1714)
        at org.apache.spark.SparkContext$$anonfun$3.apply$mcV$sp(SparkContext.scala:596)
        at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:267)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:239)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:239)
        at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:239)

是否默认向Graphite报告并忽略了我的metrics.properties(仅对CSV接收器启用)????

2 个答案:

答案 0 :(得分:1)

通过-Dspark.metrics.conf=metrics.properties而不是通过--conf spark.metrics.conf=./metrics.properties传递此问题 这就是为什么即使你的文件被添加它也不用于指标配置,而是使用默认的metrics.properties

答案 1 :(得分:0)

是的,我意识到我在本地(从我运行spark-submit的目录)中有metrics.properties文件,但是我通过了  spark-submit中的--files=file:///home/log_properties/metrics.properties没有...当我通过更新本地文件(删除Graphite标志)解决了问题。当我已经传递了我想用于工作的metrics.properties时,我仍然很困惑为什么它应该关心本地文件(metrics.properties)。