我有以下格式的spark DataFrame:
// using spark 2.2 and scala 2.11
// both the columns are of type DecimalType(38,18)
// my DataFrame is df
+--------------------+--------------------+
| src_amount | dst_amount|
+--------------------+--------------------+
|4.600000000000000000|4.000000000000000000|
|6.000000000000000000|6.000000000000000000|
+--------------------+--------------------+
现在,我想创建一个名为matchResult
的新列,这样,如果src_amount == dst_amount
的值将为match
,否则为nomatch
。即结果将是
+--------------------+--------------------+------------+
| src_amount | dst_amount| matchResult|
+--------------------+--------------------+------------+
|4.600000000000000000|4.000000000000000000| nomatch|
|6.000000000000000000|6.000000000000000000| match|
+--------------------+--------------------+------------+
为此,我编写了一个简单的udf
import java.math.BigDecimal
def myFunc = udf((a:BigDecimal,b:BigDecimal) => {
if (a.compareTo(b) == 0) "match" else "nomatch"
})
df.withColumn("matchResult",myFunc(df("dst_amount"),df("src_amount")))
是否有避免udf的方法? DataFrame中是否有任何内置功能可以比较十进制类型?