使用scalaTest进行Spark测试失败

时间:2016-08-25 15:05:52

标签: scala

这是使用ScalaTest在Scala中使用Spark应用程序的Test类,在运行 sbt test 时,我得到 org引起的 java.lang.ExceptionInInitializerError 。 apache.spark.SparkException:必须在您的配置中设置主URL 并且不执行测试,我不明白,因为我在声明conf时将主设置为本地。有谁知道为什么?

测试类:

import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import org.scalatest._

class SizeByMailboxTest extends FlatSpec with Matchers with BeforeAndAfter {

   val master = "local"
   val appName = "example-spark"
   var sc: SparkContext = _

  before {
    val conf = new SparkConf().setMaster(master).setAppName(appName)
    sc = new SparkContext(conf)
  }

  after {
    if (sc != null) {
      sc.stop()
    }
}
  behavior of "SizeByMailbox"
  it should "count total content size per mailbox  with duplicates" in { 
    val sample = Array(
                       SizeByMailbox.Message("1",10,50),
                       SizeByMailbox.Message("2",5,60),
                       SizeByMailbox.Message("2",8,40),
                       SizeByMailbox.Message("1",7,80)
                      ) 
    val samples = sc.parallelize(sample)
    val sizeById = SizeByMailbox.count(samples)
    sizeById.collect().map(m=>SizeByMailbox.MailBox(m.mailboxid,m.totalsize)) should contain allOf (SizeByMailbox.MailBox("1", 130),SizeByMailbox.MailBox("2", 100))
  }       
} 

应用程序:

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._
import com.datastax.spark.connector._
import org.apache.spark.rdd.RDD


object SizeByMailbox {
   val sc = new SparkContext();
   case class Message(mailboxid: String, bodyoctets: Int,fullcontentoctets: Int) ;
   case class MailBox(mailboxid: String,totalsize: Int); 
   def count(messages: RDD[Message]) : RDD[MailBox] = {

      val total_by_mailbox = messages.map (m =>     (m.mailboxid,m.fullcontentoctets)).reduceByKey(_+_).map( m => MailBox(m._1,m._2))
      total_by_mailbox
   }
   def main(args: Array[String]) {
       `**enter code here**` ....
   }

1 个答案:

答案 0 :(得分:0)

您正在App中创建另一个SparkContext,但未指定SparkConf,因此没有主网址。

object SizeByMailbox {
    val sc = new SparkContext(); <-- here

您尚未发布堆栈跟踪,但这是我怀疑错误的原因。

作为一种好的做法,尽量不要在JVM中创建多个活动 SparkContext,因为Spark可能会出现不可预测的行为。