在Scalatest中比较两个数据帧时出现精度错误

时间:2019-09-06 05:30:18

标签: scala apache-spark-sql scalatest

我正在尝试将实际数据帧与scala测试中的预期数据进行比较。 两个数据帧都有一个Int类型的count列,但是在比较两个帧时,我得到了如下的精度错误:


status=[]
def has23(nums):
  for num in nums:
        if num == 2 or num == 3: status.append(True);
        else: status.append(False)
  return status
print has23([4,3])

有人可以建议我要去哪里吗?

以下是实际df的代码段:

      Row
+--------+-----+--------+
|agerange|count|datadate|
+--------+-----+--------+
|   30-39|    1|20190906|
+--------+-----+--------+
was considered not equal to
+--------+-----+--------+
|agerange|count|datadate|
+--------+-----+--------+
|   30-39|    1|20190905|
+--------+-----+--------+

Row
+--------+-----+--------+
|agerange|count|datadate|
+--------+-----+--------+
|   80-89|    2|20190906|
+--------+-----+--------+
was considered not equal to
+--------+-----+--------+
|agerange|count|datadate|
+--------+-----+--------+
|   80-89|    2|20190905|
+--------+-----+--------+

Row
+--------+-----+--------+
|agerange|count|datadate|
+--------+-----+--------+
|   90-99|    1|20190906|
+--------+-----+--------+
was considered not equal to
+--------+-----+--------+
|agerange|count|datadate|
+--------+-----+--------+
|   90-99|    1|20190905|
+--------+-----+--------+

schema tolerance:
  * precisions by column
      [*, 1.0E-6]
  * ignore nullable flag for each column
  * ignore column order

预期DF的代码:

val windowSpec = Window
      .partitionBy("agerange")

    rangeDf
      .select("agerange")
      .withColumn("count", count("agerange") over windowSpec cast("Int"))
      .distinct()
      .withColumn("datadate", lit(runningDate.runningDateString))

0 个答案:

没有答案