我有一个函数isJSON()
,它返回Column类型的比较。
def isJSON( element: Column ): Column = {
element.contains("{") && element.contains("}")
}
这是我通常使用它的方式,它按预期工作:
df.withColumn("is_json", isJSON( col("data") ))
我正在尝试使用FunSpec
编写单元测试,但我无法对Column
类型的数据进行断言。
describe("isJSON()") {
it("should return false if data is not JSON") {
val df = Seq( "Not a JSON" ).toDF( "data" )
assert( isJSON( df("data") ).equals( lit( false ) ))
}
}
使用以下堆栈跟踪输出单元测试错误:
ScalaTestFailureLocation: com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1 at (DatalakeFunSpecTest.scala:29)
org.scalatest.exceptions.TestFailedException: datalake.this.`package`.isJSON(df.apply("data")).equals(org.apache.spark.sql.functions.lit(false)) was false
at org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
at org.scalatest.FunSpec.newAssertionFailedException(FunSpec.scala:1626)
at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(DatalakeFunSpecTest.scala:29)
at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(DatalakeFunSpecTest.scala:23)
at com.mhedu.common.datalake.DatalakeFunSpecTest$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(DatalakeFunSpecTest.scala:23)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSpecLike$$anon$1.apply(FunSpecLike.scala:422)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FunSpec.withFixture(FunSpec.scala:1626)
at org.scalatest.FunSpecLike$class.invokeWithFixture$1(FunSpecLike.scala:419)
at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431)
at org.scalatest.FunSpecLike$$anonfun$runTest$1.apply(FunSpecLike.scala:431)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FunSpecLike$class.runTest(FunSpecLike.scala:431)
at com.mhedu.common.datalake.DatalakeFunSpecTest.org$scalatest$BeforeAndAfter$$super$runTest(DatalakeFunSpecTest.scala:13)
at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
at com.mhedu.common.datalake.DatalakeFunSpecTest.runTest(DatalakeFunSpecTest.scala:13)
at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:464)
at org.scalatest.FunSpecLike$$anonfun$runTests$1.apply(FunSpecLike.scala:464)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FunSpecLike$class.runTests(FunSpecLike.scala:464)
at org.scalatest.FunSpec.runTests(FunSpec.scala:1626)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at org.scalatest.FunSpec.org$scalatest$FunSpecLike$$super$run(FunSpec.scala:1626)
at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:468)
at org.scalatest.FunSpecLike$$anonfun$run$1.apply(FunSpecLike.scala:468)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FunSpecLike$class.run(FunSpecLike.scala:468)
at com.mhedu.common.datalake.DatalakeFunSpecTest.org$scalatest$BeforeAndAfter$$super$run(DatalakeFunSpecTest.scala:13)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
at com.mhedu.common.datalake.DatalakeFunSpecTest.run(DatalakeFunSpecTest.scala:13)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2557)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
at org.scalatest.tools.Runner$.run(Runner.scala:883)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
有没有办法可以为Column
类型编写断言或以某种方式提取Boolean的原始值并进行比较?
答案 0 :(得分:0)
您正在测试两个Column
实例的相等性;这些实例不是相同 - 如果应用于你的DF会产生相同的结果,但它们不相等(很容易将它们应用于不同的DF并得到不同的结果)。 / p>
对此进行测试的一种方法是filter
DataFrame,这两个Column
的条件(isJSON
和lit(true)
的结果)相等,并且然后断言结果的大小为0:
describe("isJSON()") {
it("should return false if data is not JSON") {
val df = Seq("Not a JSON").toDF( "data" )
assert(df.filter(isJSON(df("data")) === lit(true)).count() == 0)
}
}
另一种选择是收集计算此列的结果,并断言所有结果为false
,例如:
describe("isJSON()") {
it("should return false if data is not JSON") {
val df = Seq("Not a JSON").toDF( "data" )
val results: Array[Boolean] = df.select(isJSON(df("data"))).collect().map { case Row(b: Boolean) => b }
assert(results sameElements Array(false))
}
}
还有许多其他类似的选项,这里重要的概念是比较数据而不是Column
个对象 - 只要断言表达式中的比较类型是列,你就不是比较实际结果。