更正apache spark的build.sbt文件

时间:2015-01-08 10:16:03

标签: scala sbt apache-spark

我如何编写build.sbt文件,以便为火花流和spark sql提供apache spark的功能独立应用程序?目前我的build.sbt文件如下:

libraryDependencies ++= Seq(
  "org.apache.spark" % "spark-core_2.10" % "1.2.0",
  "org.apache.spark" % "spark-streaming_2.10" % "1.2.0",
  "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.2.0",
  "org.apache.spark" % "spark-sql_2.10" % "1.2.0",
  "org.apache.spark" % "spark-catalyst_2.10" % "1.2.0"
)

lazy val root = (project in file(".")).
  settings(
    name := "Test",
    version := "0.1",
    scalaVersion := "2.10.4"
  )

要编译我运行的所有内容sbt compile,然后我启动了我的应用程序:

sbt "runMain SqlTest spark://Marvins-MacBook-Air.local:7077"

其中SqlTest的定义如下:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext


case class User(uid: String, name: String, surname: String)

object SqlTest {
  def main(args: Array[String]) {
    val Array(master) = args
    val sparkConf = new SparkConf().setMaster(master).setAppName("sql-test")
    val sc = new SparkContext(sparkConf)

    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.createSchemaRDD

    val pathUsers = "users.txt"
    val users = sc.textFile(pathUsers)
      .map(_.split(" "))
      .map(u => User(u(0), u(1), u(2)))

    users.collect()
  }
}

引发以下错误:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/01/08 10:57:51 INFO SecurityManager: Changing view acls to: se7entyse7en
15/01/08 10:57:51 INFO SecurityManager: Changing modify acls to: se7entyse7en
15/01/08 10:57:51 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(se7entyse7en); users with modify permissions: Set(se7entyse7en)
15/01/08 10:57:51 INFO Slf4jLogger: Slf4jLogger started
15/01/08 10:57:52 INFO Remoting: Starting remoting
15/01/08 10:57:52 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.100.195:64794]
15/01/08 10:57:52 INFO Utils: Successfully started service 'sparkDriver' on port 64794.
15/01/08 10:57:52 INFO SparkEnv: Registering MapOutputTracker
15/01/08 10:57:52 INFO SparkEnv: Registering BlockManagerMaster
15/01/08 10:57:52 INFO DiskBlockManager: Created local directory at /var/folders/3r/v7swlvdn2p7_wyh9wj90td2m0000gn/T/spark-local-20150108105752-6363
15/01/08 10:57:52 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
2015-01-08 10:57:52.555 java[9606:1986593] Unable to load realm info from SCDynamicStore
15/01/08 10:57:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/08 10:57:52 INFO HttpFileServer: HTTP File server directory is /var/folders/3r/v7swlvdn2p7_wyh9wj90td2m0000gn/T/spark-dcd2b5f9-1ef8-4248-b868-91b59464d8e2
15/01/08 10:57:52 INFO HttpServer: Starting HTTP Server
15/01/08 10:57:52 INFO Utils: Successfully started service 'HTTP file server' on port 64795.
15/01/08 10:57:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/01/08 10:57:53 INFO SparkUI: Started SparkUI at http://192.168.100.195:4040
15/01/08 10:57:53 INFO AppClient$ClientActor: Connecting to master spark://Marvins-MacBook-Air.local:7077...
15/01/08 10:57:53 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150108105753-0016
15/01/08 10:57:53 INFO AppClient$ClientActor: Executor added: app-20150108105753-0016/0 on worker-20150108094022-192.168.100.195-63861 (192.168.100.195:63861) with 4 cores
15/01/08 10:57:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150108105753-0016/0 on hostPort 192.168.100.195:63861 with 4 cores, 512.0 MB RAM
15/01/08 10:57:53 INFO AppClient$ClientActor: Executor added: app-20150108105753-0016/1 on worker-20150108093605-192.168.100.195-63799 (192.168.100.195:63799) with 4 cores
15/01/08 10:57:53 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150108105753-0016/1 on hostPort 192.168.100.195:63799 with 4 cores, 512.0 MB RAM
15/01/08 10:57:53 INFO AppClient$ClientActor: Executor updated: app-20150108105753-0016/1 is now LOADING
15/01/08 10:57:53 INFO AppClient$ClientActor: Executor updated: app-20150108105753-0016/0 is now LOADING
15/01/08 10:57:53 INFO AppClient$ClientActor: Executor updated: app-20150108105753-0016/0 is now RUNNING
15/01/08 10:57:53 INFO AppClient$ClientActor: Executor updated: app-20150108105753-0016/1 is now RUNNING
15/01/08 10:57:53 INFO NettyBlockTransferService: Server created on 64797
15/01/08 10:57:53 INFO BlockManagerMaster: Trying to register BlockManager
15/01/08 10:57:53 INFO BlockManagerMasterActor: Registering block manager 192.168.100.195:64797 with 530.3 MB RAM, BlockManagerId(<driver>, 192.168.100.195, 64797)
15/01/08 10:57:53 INFO BlockManagerMaster: Registered BlockManager
15/01/08 10:57:54 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/01/08 10:57:54 INFO AppClient$ClientActor: Executor updated: app-20150108105753-0016/1 is now RUNNING
15/01/08 10:57:54 INFO AppClient$ClientActor: Executor updated: app-20150108105753-0016/0 is now RUNNING
15/01/08 10:57:54 INFO MemoryStore: ensureFreeSpace(138675) called with curMem=0, maxMem=556038881
15/01/08 10:57:54 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 135.4 KB, free 530.1 MB)
15/01/08 10:57:54 INFO MemoryStore: ensureFreeSpace(18512) called with curMem=138675, maxMem=556038881
15/01/08 10:57:54 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 18.1 KB, free 530.1 MB)
15/01/08 10:57:54 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.100.195:64797 (size: 18.1 KB, free: 530.3 MB)
15/01/08 10:57:54 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/01/08 10:57:54 INFO SparkContext: Created broadcast 0 from textFile at sqlTest.scala:17
15/01/08 10:57:55 INFO FileInputFormat: Total input paths to process : 1
15/01/08 10:57:55 INFO SparkContext: Starting job: collect at sqlTest.scala:21
15/01/08 10:57:55 INFO DAGScheduler: Got job 0 (collect at sqlTest.scala:21) with 2 output partitions (allowLocal=false)
15/01/08 10:57:55 INFO DAGScheduler: Final stage: Stage 0(collect at sqlTest.scala:21)
15/01/08 10:57:55 INFO DAGScheduler: Parents of final stage: List()
15/01/08 10:57:55 INFO DAGScheduler: Missing parents: List()
15/01/08 10:57:55 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[3] at map at sqlTest.scala:19), which has no missing parents
15/01/08 10:57:55 INFO MemoryStore: ensureFreeSpace(2792) called with curMem=157187, maxMem=556038881
15/01/08 10:57:55 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.7 KB, free 530.1 MB)
15/01/08 10:57:55 INFO MemoryStore: ensureFreeSpace(1981) called with curMem=159979, maxMem=556038881
15/01/08 10:57:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1981.0 B, free 530.1 MB)
15/01/08 10:57:55 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.100.195:64797 (size: 1981.0 B, free: 530.3 MB)
15/01/08 10:57:55 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/01/08 10:57:55 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/01/08 10:57:55 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[3] at map at sqlTest.scala:19)
15/01/08 10:57:55 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/01/08 10:57:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@192.168.100.195:64802/user/Executor#1169546094] with ID 0
15/01/08 10:57:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:56 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:56 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@192.168.100.195:64805/user/Executor#1103970424] with ID 1
15/01/08 10:57:57 INFO BlockManagerMasterActor: Registering block manager 192.168.100.195:64808 with 265.4 MB RAM, BlockManagerId(0, 192.168.100.195, 64808)
15/01/08 10:57:57 INFO BlockManagerMasterActor: Registering block manager 192.168.100.195:64809 with 265.4 MB RAM, BlockManagerId(1, 192.168.100.195, 64809)
15/01/08 10:57:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.100.195): java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

15/01/08 10:57:57 INFO TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 1]
15/01/08 10:57:57 INFO TaskSetManager: Starting task 1.1 in stage 0.0 (TID 2, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:57 INFO TaskSetManager: Starting task 0.1 in stage 0.0 (TID 3, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:57 INFO TaskSetManager: Lost task 0.1 in stage 0.0 (TID 3) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 2]
15/01/08 10:57:57 INFO TaskSetManager: Starting task 0.2 in stage 0.0 (TID 4, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:57 INFO TaskSetManager: Lost task 1.1 in stage 0.0 (TID 2) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 3]
15/01/08 10:57:57 INFO TaskSetManager: Starting task 1.2 in stage 0.0 (TID 5, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:57 INFO TaskSetManager: Lost task 0.2 in stage 0.0 (TID 4) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 4]
15/01/08 10:57:57 INFO TaskSetManager: Starting task 0.3 in stage 0.0 (TID 6, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:57 INFO TaskSetManager: Lost task 1.2 in stage 0.0 (TID 5) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 5]
15/01/08 10:57:57 INFO TaskSetManager: Starting task 1.3 in stage 0.0 (TID 7, 192.168.100.195, PROCESS_LOCAL, 1326 bytes)
15/01/08 10:57:57 INFO TaskSetManager: Lost task 0.3 in stage 0.0 (TID 6) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 6]
15/01/08 10:57:57 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job
15/01/08 10:57:57 INFO TaskSchedulerImpl: Cancelling stage 0
15/01/08 10:57:57 INFO TaskSchedulerImpl: Stage 0 was cancelled
15/01/08 10:57:57 INFO DAGScheduler: Job 0 failed: collect at sqlTest.scala:21, took 2.089380 s
[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 192.168.100.195): java.io.EOFException
[error]     at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
[error]     at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
[error]     at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
[error]     at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
[error]     at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
[error]     at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
[error]     at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
[error]     at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
[error]     at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
[error]     at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
[error]     at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
[error]     at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error]     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[error]     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error]     at java.lang.reflect.Method.invoke(Method.java:606)
[error]     at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
[error]     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
[error]     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
[error]     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
[error]     at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
[error]     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
[error]     at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
[error]     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
[error]     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
[error]     at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
[error]     at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
[error]     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
[error]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[error]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[error]     at java.lang.Thread.run(Thread.java:744)
[error] 
[error] Driver stacktrace:
15/01/08 10:57:57 INFO TaskSetManager: Lost task 1.3 in stage 0.0 (TID 7) on executor 192.168.100.195: java.io.EOFException (null) [duplicate 7]
15/01/08 10:57:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 192.168.100.195): java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2744)
at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1032)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
at org.apache.hadoop.io.UTF8.readChars(UTF8.java:216)
at org.apache.hadoop.io.UTF8.readString(UTF8.java:208)
at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:237)
at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at org.apache.spark.SerializableWritable$$anonfun$readObject$1.apply$mcV$sp(SerializableWritable.scala:43)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:985)
at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last compile:runMain for the full output.
15/01/08 10:57:57 ERROR Utils: Uncaught exception in thread SparkListenerBus
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:996)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:317)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:48)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply(LiveListenerBus.scala:47)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1460)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:46)
15/01/08 10:57:57 ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:136)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply(ContextCleaner.scala:134)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1460)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:133)
at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:65)
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:runMain for the full output.
[error] (compile:runMain) Nonzero exit code: 1
[error] Total time: 7 s, completed Jan 8, 2015 10:57:57 AM

2 个答案:

答案 0 :(得分:1)

您不仅需要编译项目,还需要将其打包为包含所有依赖项的已组装jar文件。删除所有依赖项冲突可能是一个小技巧。您可以查看我使用Spark-Streaming的Spark示例项目,从

开始可能会有所帮助

https://github.com/pellucidanalytics/tweet-driven-comparable-companies

特别是对依赖性排除https://github.com/pellucidanalytics/tweet-driven-comparable-companies/blob/master/project/Dependency.scala

阅读一些文章:

http://prabstechblog.blogspot.com/2014/04/creating-single-jar-for-spark-project.html http://eugenezhulenev.com/blog/2014/11/20/twitter-analytics-with-spark/ http://eugenezhulenev.com/blog/2014/10/18/run-tests-in-standalone-spark-cluster/

答案 1 :(得分:0)

您在Spark群集上运行应用程序,所以我猜问题是您的代码找不到users.txt文件。尝试使用可以独立于当前工作目录和/或使用local[3]作为master参数访问的路径。