使用spark ipython样板,是否可以创建火花流应用程序。由于火花上下文是使用笔记本预先配置的,因此这似乎不可能。我正在尝试一个简单的应用程序:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
val ssc = new StreamingContext(sc, Seconds(1))
val lines = ssc.socketTextStream("129.41.138.175", 9999)
// Split each line into words
val words = lines.flatMap(_.split(" "))
// Count each word in each batch
val pairs = words.map(word => (word, 1))
val wordCounts = pairs.reduceByKey(_ + _)
// Print the first ten elements of each RDD generated in this DStream
wordCounts.print()
ssc.start() // Start the computation
ssc.awaitTermination() // Wait for the computation to terminate
Error:
Name: akka.actor.InvalidActorNameException
Message: actor name [JobScheduler] is not unique!
StackTrace: akka.actor.dungeon.ChildrenContainer$NormalChildrenContainer.reserve(ChildrenContainer.scala:130)
akka.actor.dungeon.Children$class.reserveChild(Children.scala:77)
akka.actor.ActorCell.reserveChild(ActorCell.scala:369)
akka.actor.dungeon.Children$class.makeChild(Children.scala:202)
akka.actor.dungeon.Children$class.attachChild(Children.scala:42)
akka.actor.ActorCell.attachChild(ActorCell.scala:369)
akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)
org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:58)
...
答案 0 :(得分:0)
我们支持Spark流媒体。它内置于我们在Bluemix上部署的Spark中。但这取决于Spark版本和使用的语言。像1.3.1早期的Spark不支持python的流式传输。目前的版本是1.4.1。