Scala - 单元测试列类型函数

时间:2017-08-07 20:28:03

标签: scala unit-testing scalatest

我有一个函数isJSON(),它返回Column类型的比较。

  def isJSON( element: Column ): Column = {
    element.contains("{") && element.contains("}")
  }

这是我通常使用它的方式,它按预期工作:

df.withColumn("is_json", isJSON( col("data") ))

我正在尝试使用FunSpec编写单元测试,但我无法对Column类型的数据进行断言。

describe("isJSON()") {
  it("should return false if data is not JSON") {
    val df = Seq( "Not a JSON" ).toDF( "data" )
    assert( isJSON( df("data") ).equals( lit( false ) ))
  }
}

使用以下堆栈跟踪输出单元测试错误:

ScalaTestFailureLocation: com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1 at (DatalakeFunSpecTest.scala:29)
org.scalatest.exceptions.TestFailedException: datalake.this.`package`.isJSON(df.apply("data")).equals(org.apache.spark.sql.functions.lit(false)) was false
    at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
    at org.scalatest.FunSpec.newAssertionFailedException(FunSpec.scala:1626)
    at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
    at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(DatalakeFunSpecTest.scala:29)
    at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(DatalakeFunSpecTest.scala:23)
    at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(DatalakeFunSpecTest.scala:23)
    at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
    at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
    at org.scalatest.Transformer.apply(Transformer.scala:22)
    at org.scalatest.Transformer.apply(Transformer.scala:20)
    at org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:422)
    at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
    at org.scalatest.FunSpec.withFixture(FunSpec.scala:1626)
    at org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:419)
    at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431)
    at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431)
    at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
    at org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:431)
    at com.mhedu.common.datalake.DatalakeFunSpecTest.org$scalatest$BeforeAndAfter$$super$runTest(DatalakeFunSpecTest.scala:13)
    at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
    at com.mhedu.common.datalake.DatalakeFunSpecTest.runTest(DatalakeFunSpecTest.scala:13)
    at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:464)
    at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:464)
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
    at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
    at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
    at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
    at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
    at org.scalatest.FunSpecLike$class.runTests(FunSpecLike.scala:464)
    at org.scalatest.FunSpec.runTests(FunSpec.scala:1626)
    at org.scalatest.Suite$class.run(Suite.scala:1424)
    at org.scalatest.FunSpec.org$scalatest$FunSpecLike$$super$run(FunSpec.scala:1626)
    at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:468)
    at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:468)
    at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
    at org.scalatest.FunSpecLike$class.run(FunSpecLike.scala:468)
    at com.mhedu.common.datalake.DatalakeFunSpecTest.org$scalatest$BeforeAndAfter$$super$run(DatalakeFunSpecTest.scala:13)
    at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
    at com.mhedu.common.datalake.DatalakeFunSpecTest.run(DatalakeFunSpecTest.scala:13)
    at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
    at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
    at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
    at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
    at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
    at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
    at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
    at org.scalatest.tools.Runner$.run(Runner.scala:883)
    at org.scalatest.tools.Runner.run(Runner.scala)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
    at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)

有没有办法可以为Column类型编写断言或以某种方式提取Boolean的原始值并进行比较?

1 个答案:

答案 0 :(得分:0)

您正在测试两个Column实例的相等性;这些实例不是相同 - 如果应用于你的DF会产生相同的结果,但它们不相等(很容易将它们应用于不同的DF并得到不同的结果)。 / p>

对此进行测试的一种方法是filter DataFrame,这两个Column的条件(isJSONlit(true)的结果)相等,并且然后断言结果的大小为0:

describe("isJSON()") {
  it("should return false if data is not JSON") {
    val df = Seq("Not a JSON").toDF( "data" )
    assert(df.filter(isJSON(df("data")) === lit(true)).count() == 0)
  }
}

另一种选择是收集计算此列的结果,并断言所有结果为false,例如:

describe("isJSON()") {
  it("should return false if data is not JSON") {
    val df = Seq("Not a JSON").toDF( "data" )
    val results: Array[Boolean] = df.select(isJSON(df("data"))).collect().map { case Row(b: Boolean) => b }
    assert(results sameElements Array(false))
  }
}

还有许多其他类似的选项,这里重要的概念是比较数据而不是Column个对象 - 只要断言表达式中的比较类型是列,你就不是比较实际结果。