我正在尝试使用Scala编写Spark流媒体应用程序,该应用程序应按照here提供的说明每秒读取Twitter提要。
我的代码是:
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import TutorialHelper._
object Tutorial {
def main(args: Array[String]) {
// Checkpoint directory
val checkpointDir = TutorialHelper.getCheckpointDirectory()
// Configure Twitter credentials
val apiKey = "blabla"
val apiSecret = "blabla"
val accessToken = "blabla"
val accessTokenSecret = "blabla"
TutorialHelper.configureTwitterCredentials(apiKey, apiSecret, accessToken, accessTokenSecret)
val ssc = new StreamingContext(new SparkConf(), Seconds(1))
val tweets = TwitterUtils.createStream(ssc, None)
val statuses = tweets.map(status => status.getText())
statuses.print()
ssc.checkpoint(checkpointDir)
ssc.start()
ssc.awaitTermination()
}
}
当我尝试执行时,使用spark-submit我得到以下日志:
Configuring Twitter OAuth
Property twitter4j.oauth.consumerKey set as [blabla]
Property twitter4j.oauth.accessToken set as [blabla]
Property twitter4j.oauth.consumerSecret set as [blabla]
Property twitter4j.oauth.accessTokenSecret set as [blabla]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/09/28 13:22:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/28 13:22:11 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.163.145 instead (on interface eth0)
15/09/28 13:22:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/09/28 13:22:13 INFO Slf4jLogger: Slf4jLogger started
15/09/28 13:22:13 INFO Remoting: Starting remoting
15/09/28 13:22:13 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.163.145:59422]
15/09/28 13:22:15 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
15/09/28 13:22:17 INFO WriteAheadLogManager : Recovered 1 write ahead log files from file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata
-------------------------------------------
Time: 1443471738000 ms
-------------------------------------------
15/09/28 13:22:18 INFO WriteAheadLogManager : Attempting to clear 1 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471737000: file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata/log-1443468716010-1443468776010
15/09/28 13:22:18 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471737000
-------------------------------------------
Time: 1443471739000 ms
-------------------------------------------
15/09/28 13:22:19 INFO WriteAheadLogManager : Attempting to clear 0 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471738000:
15/09/28 13:22:19 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471738000
-------------------------------------------
Time: 1443471740000 ms
-------------------------------------------
15/09/28 13:22:20 INFO WriteAheadLogManager : Attempting to clear 0 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471739000:
15/09/28 13:22:20 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471739000
-------------------------------------------
Time: 1443471741000 ms
-------------------------------------------
15/09/28 13:22:21 INFO WriteAheadLogManager : Attempting to clear 0 old log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471740000:
15/09/28 13:22:21 INFO WriteAheadLogManager : Cleared log files in file:/home/nikos/Desktop/spark-1.5.0-bin-hadoop2.6/streaming/scala/checkpoint/receivedBlockMetadata older than 1443471740000
15/09/28 13:22:21 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/28 13:22:21 INFO TwitterStreamImpl: Establishing connection.
15/09/28 13:22:21 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/28 13:22:21 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:542)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:532)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1975)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/09/28 13:22:21 ERROR ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-7] shutting down ActorSystem [sparkDriver]
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStop(TwitterInputDStream.scala:101)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stopReceiver(ReceiverSupervisor.scala:169)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stop(ReceiverSupervisor.scala:136)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$2$$anonfun$receive$1.applyOrElse(ReceiverSupervisorImpl.scala:79)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/09/28 13:22:21 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/09/28 13:22:21 ERROR ErrorMonitor: Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-7] shutting down ActorSystem [sparkDriver]
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:52)
at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60)
at org.apache.spark.Logging$class.logInfo(Logging.scala:59)
at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStop(TwitterInputDStream.scala:101)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stopReceiver(ReceiverSupervisor.scala:169)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.stop(ReceiverSupervisor.scala:136)
at org.apache.spark.streaming.receiver.ReceiverSupervisorImpl$$anon$2$$anonfun$receive$1.applyOrElse(ReceiverSupervisorImpl.scala:79)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126)
at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/09/28 13:22:21 WARN ReceiverTracker: Receiver 0 exited but didn't deregister
15/09/28 13:22:21 INFO WriteAheadLogManager : Stopped write ahead log manager
15/09/28 13:22:22 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
15/09/28 13:22:22 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
15/09/28 13:22:22 WARN AkkaRpcEndpointRef: Error sending message [message = RemoveBroadcast(0,true)] in 1 attempts
org.apache.spark.rpc.RpcTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#1080029491]] had already been terminated.. This timeout is controlled by spark.rpc.askTimeout
at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at org.spark-project.guava.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:293)
at scala.concurrent.impl.ExecutionContextImpl$$anon$1.execute(ExecutionContextImpl.scala:133)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40)
at scala.concurrent.impl.Promise$DefaultPromise.scala$concurrent$impl$Promise$DefaultPromise$$dispatchOrAddCallback(Promise.scala:280)
at scala.concurrent.impl.Promise$DefaultPromise.onComplete(Promise.scala:270)
at scala.concurrent.Future$class.recover(Future.scala:324)
at scala.concurrent.impl.Promise$DefaultPromise.recover(Promise.scala:153)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:319)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:128)
at org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
at org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
at org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67)
at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
Caused by: akka.pattern.AskTimeoutException: Recipient[Actor[akka://sparkDriver/user/BlockManagerMaster#1080029491]] had already been terminated.
at akka.pattern.AskableActorRef$.ask$extension(AskSupport.scala:132)
at org.apache.spark.rpc.akka.AkkaRpcEndpointRef.ask(AkkaRpcEnv.scala:307)
... 14 more
答案 0 :(得分:2)
这个问题也得到了回答:
Spark streaming StreamingContext.start() - Error starting receiver 0
可以通过更改build.sbt中的scalaVersion和libraryDependencies来解决这个问题,以匹配Spark版本的那些。
e.g:
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming" % "1.6.1" % "provided",
"org.apache.spark" %% "spark-streaming-twitter" % "1.6.1"
)
请务必使用新生成的scala-2.11类,例如:
../../../../spark-1.6.1/bin/spark-submit --class Tutorial /Users/mendezr/development/spark/exercises_spark_submit_2014/usb/streaming/scala/target/scala-2.11/Tutorial-assembly-0.1-SNAPSHOT.jar
答案 1 :(得分:0)
对于spark 1.6.x,请使用scala 2.10.6 版本,并确保在build.sbt文件中设置。
确保您的主机名链接到linux中的当前IP 环境主机在/ etc / hosts文件中解析。