Question

我已经编写了关于DataframeGenerator example的单元测试，它允许您动态生成模拟数据帧

成功执行以下命令后

sbt clean
sbt update
sbt compile

运行以下任一命令时，我得到输出中显示的错误

sbt assembly
sbt test -- -oF

输出

...
[info] SearchClicksProcessorTest:
17/11/24 14:19:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/24 14:19:07 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
17/11/24 14:19:18 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/11/24 14:19:18 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
17/11/24 14:19:19 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
[info] - testExplodeMap *** FAILED ***
[info]   ExceptionInInitializerError was thrown during property evaluation.
[info]     Message: "None"
[info]     Occurred when passed generated values (
[info]   
[info]     )
[info] - testFilterByClicks *** FAILED ***
[info]   NoClassDefFoundError was thrown during property evaluation.
[info]     Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info]     Occurred when passed generated values (
[info]   
[info]     )
[info] - testGetClicksData *** FAILED ***
[info]   NoClassDefFoundError was thrown during property evaluation.
[info]     Message: Could not initialize class org.apache.spark.rdd.RDDOperationScope$
[info]     Occurred when passed generated values (
[info]   
[info]     )
...
[info] *** 3 TESTS FAILED ***
[error] Failed: Total 6, Failed 3, Errors 0, Passed 3
[error] Failed tests:
[error]         com.company.spark.ml.pipelines.search.SearchClicksProcessorTest
[error] (root/test:test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 73 s, completed 24 Nov, 2017 2:19:28 PM

我尝试过的事情没有成功

使用F标志运行sbt test以显示完整的堆栈跟踪（不显示堆栈跟踪输出，如上所示）
在IntelliJ Idea中重建项目

我的问题是

可能导致此错误的原因是什么？
如何在SBT中启用堆栈跟踪输出以便能够对其进行调试？

EDIT-1 我的单元测试类包含几个方法，如下面的

class SearchClicksProcessorTest extends FunSuite with Checkers {
  import spark.implicits._

  test("testGetClicksData") {
    val schemaIn = StructType(List(
      StructField("rank", IntegerType),
      StructField("city_id", IntegerType),
      StructField("target", IntegerType)
    ))
    val schemaOut = StructType(List(
      StructField("clicked_res_rank", IntegerType),
      StructField("city_id", IntegerType),
    ))
    val dataFrameGen = DataframeGenerator.arbitraryDataFrame(spark.sqlContext, schemaIn)

    val property = Prop.forAll(dataFrameGen.arbitrary) { dfIn: DataFrame =>
      dfIn.cache()
      val dfOut: DataFrame = dfIn.transform(SearchClicksProcessor.getClicksData)

      dfIn.schema === schemaIn &&
        dfOut.schema === schemaOut &&
        dfIn.filter($"target" === 1).count() === dfOut.count()
    }
    check(property)
  }
}

而build.sbt看起来像这样

// core settings
organization := "com.company"
scalaVersion := "2.11.11"

name := "repo-name"
version := "0.0.1"

// cache options
offline := false
updateOptions := updateOptions.value.withCachedResolution(true)

// aggregate options
aggregate in assembly := false
aggregate in update := false

// fork options
fork in Test := true

//common libraryDependencies
libraryDependencies ++= Seq(
  scalaTest,
  typesafeConfig,
  ...
  scalajHttp
)

libraryDependencies ++= allAwsDependencies
libraryDependencies ++= SparkDependencies.allSparkDependencies

assemblyMergeStrategy in assembly := {
  case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
  ...
  case _ => MergeStrategy.first
}

lazy val module-1 = project in file("directory-1")

lazy val module-2 = (project in file("directory-2")).
  dependsOn(module-1).
  aggregate(module-1)

lazy val root = (project in file(".")).
  dependsOn(module-2).
  aggregate(module-2)

Answer 1

P.S。在阅读本答案之前请阅读comments on original question

即使overriding SBT's transitive dependency超过faster-xml.jackson的流行解决方案对我也不起作用;因为需要进行一些更改（ExceptionInInitializerError已经消失，但还会出现其他一些错误）
最后（除了上面提到的修复）我最后以不同的方式创建了DataFrame（而不是这里使用的StructType）。我把它们创建为

spark.sparkContext.parallelize(Seq(MyType)).toDF()

根据MyType的架构，case class为DataFrame
在实施此解决方案时，我遇到了一个小问题，即从case class生成的架构的数据类型是正确的，{{1}田野往往不匹配;已找到此问题的修复程序here

此处我公然承认我不确定什么是正确的修复：nullability依赖或创建faster-xml.jackson的替代方式，所以请随意填补理解/调查问题的失误

Scala单元测试中的ExceptionInInitializerError（Scalacheck，Scalatest）

1 个答案: