这是使用ScalaTest在Scala中使用Spark应用程序的Test类,在运行 sbt test 时,我得到 org引起的 java.lang.ExceptionInInitializerError 。 apache.spark.SparkException:必须在您的配置中设置主URL 并且不执行测试,我不明白,因为我在声明conf时将主设置为本地。有谁知道为什么?
测试类:
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.scalatest._
class SizeByMailboxTest extends FlatSpec with Matchers with BeforeAndAfter {
val master = "local"
val appName = "example-spark"
var sc: SparkContext = _
before {
val conf = new SparkConf().setMaster(master).setAppName(appName)
sc = new SparkContext(conf)
}
after {
if (sc != null) {
sc.stop()
}
}
behavior of "SizeByMailbox"
it should "count total content size per mailbox with duplicates" in {
val sample = Array(
SizeByMailbox.Message("1",10,50),
SizeByMailbox.Message("2",5,60),
SizeByMailbox.Message("2",8,40),
SizeByMailbox.Message("1",7,80)
)
val samples = sc.parallelize(sample)
val sizeById = SizeByMailbox.count(samples)
sizeById.collect().map(m=>SizeByMailbox.MailBox(m.mailboxid,m.totalsize)) should contain allOf (SizeByMailbox.MailBox("1", 130),SizeByMailbox.MailBox("2", 100))
}
}
应用程序:
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.rdd.RDD
object SizeByMailbox {
val sc = new SparkContext();
case class Message(mailboxid: String, bodyoctets: Int,fullcontentoctets: Int) ;
case class MailBox(mailboxid: String,totalsize: Int);
def count(messages: RDD[Message]) : RDD[MailBox] = {
val total_by_mailbox = messages.map (m => (m.mailboxid,m.fullcontentoctets)).reduceByKey(_+_).map( m => MailBox(m._1,m._2))
total_by_mailbox
}
def main(args: Array[String]) {
`**enter code here**` ....
}
答案 0 :(得分:0)
您正在App中创建另一个SparkContext
,但未指定SparkConf
,因此没有主网址。
object SizeByMailbox {
val sc = new SparkContext(); <-- here
您尚未发布堆栈跟踪,但这是我怀疑错误的原因。
作为一种好的做法,尽量不要在JVM中创建多个活动 SparkContext,因为Spark可能会出现不可预测的行为。