下面简单的火花程序运行绝对正常,如果我运行" sbt run"。但如果我跑,
1)" spark-submit.cmd eventfilter-assembly-0.1-SNAPSHOT.jar
"。使用" sbt assembly"创建jar的地方使用"流媒体和sql" sbt规则"%提供"。
2)" spark-submit.cmd --jar play-json_2.10-2.3.10.jar seventfilter_2.10-0.1-SNAPSHOT.jar
"。
两种情况下,它都在启动并等待新文件的到来。直到现在都没问题。
但是一旦我开始放置文件,以便它可以流式传输,下面就会出现异常。
注意:我使用的是spark-1.4.1-bin-hadoop2.6,
注意:如果我通过" sbt run",它会平稳运行数小时。
注意:我也尝试使用1.5.2,相应地更改sbt文件。行为是一样的。
Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.net.ConnectException: Connection timed out: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1168)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1104)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:998)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:932)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:639)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:453)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:398)
at org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:390)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:390)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:193)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
sbt文件
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "1.4.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.4.1"
libraryDependencies += "com.typesafe.play" %% "play-json" % "2.3.10"
scala文件
def main(args: Array[String]) {
val conf = new SparkConf()
conf
.setMaster("local[*]") //Comment if executing through spark-submit
.setAppName("test")
val sc = new SparkContext(conf)
val ssc = new StreamingContext(sc, Seconds(3))
val dStream = ssc.textFileStream("dir")
val expandedEventStream = dStream.count().print()
ssc.start()
ssc.awaitTermination()
}
答案 0 :(得分:0)
我有virtuaBox设置,默认情况下spark使用该IP来查找应用程序jar。禁用virtualBox IP后,spark开始正常工作。