如何比较Spark数据框中的两个小数并创建新列?

时间:2019-02-19 12:47:57

标签: scala apache-spark dataframe decimal user-defined-functions

我有以下格式的spark DataFrame:

// using spark 2.2 and scala 2.11
// both the columns are of type DecimalType(38,18) 
// my DataFrame is df    

+--------------------+--------------------+
| src_amount         |          dst_amount|
+--------------------+--------------------+
|4.600000000000000000|4.000000000000000000|
|6.000000000000000000|6.000000000000000000|
+--------------------+--------------------+

现在,我想创建一个名为matchResult的新列,这样,如果src_amount == dst_amount的值将为match,否则为nomatch。即结果将是

+--------------------+--------------------+------------+
| src_amount         |          dst_amount| matchResult|
+--------------------+--------------------+------------+
|4.600000000000000000|4.000000000000000000|     nomatch|
|6.000000000000000000|6.000000000000000000|       match|
+--------------------+--------------------+------------+

为此,我编写了一个简单的udf

import java.math.BigDecimal
def myFunc = udf((a:BigDecimal,b:BigDecimal) =>  {
  if (a.compareTo(b) == 0) "match" else "nomatch"
})

df.withColumn("matchResult",myFunc(df("dst_amount"),df("src_amount")))

是否有避免udf的方法? DataFrame中是否有任何内置功能可以比较十进制类型?

0 个答案:

没有答案