Spark - Scala:如何在UDF函数中传递10个以上的参数?

时间:2016-12-16 13:16:17

标签: scala apache-spark udf

有一个UDF函数,我传递了16个参数。 但它会引发一个错误,指出“overloaded method udf

val rec_cutoff = udf(fetch_rec_cutoff(_: String, _: String, _: String, _: String, _: String, 
      _: String, _: String, _: String, _: String, _: String, _: String, _: String, _: String, _: String, _: String, _: String))

  val EC_RECV_CUTOFF_df = EC_END_OF_DAY_df.withColumn("RECV_CUTOFF", rec_cutoff(EC_START_OF_DAY_df.col("STP_FLAG"),
    EC_END_OF_DAY_df.col("TXN_PAYMENT_TYPE"), EC_END_OF_DAY_df.col("COUNTRY_ORIG"),
    EC_END_OF_DAY_df.col("PROC_MODE"), EC_END_OF_DAY_df.col("Receiving_Cut_off"),
    EC_END_OF_DAY_df.col("REMIT_CURRENCY"), EC_END_OF_DAY_df.col("TXN_SOURCE_APPLICATION"),
    EC_END_OF_DAY_df.col("COUNTRY_DEST"), EC_END_OF_DAY_df.col("SUB_PAYMENT_TYPE"),
    EC_END_OF_DAY_df.col("Mon_Cutoff"), EC_END_OF_DAY_df.col("Tue_Cutoff"),
    EC_END_OF_DAY_df.col("Wed_Cutoff"), EC_END_OF_DAY_df.col("Thu_Cutoff"),
    EC_END_OF_DAY_df.col("Fri_Cutoff"), EC_END_OF_DAY_df.col("Sat_Cutoff"),
    EC_END_OF_DAY_df.col("Sun_Cutoff")))

    EC_END_OF_DAY_df.show(1000)

  def fetch_rec_cutoff(stp_flag: String, txn_pymt_type: String, country_orig: String, proc_mode: String, rec_cutoff: String, remit_currency: String, txn_source_app: String, dest_country: String, sub_payment_type: String, mon_cutoff: String, tue_cutoff: String, wed_cutoff: String, thu_cutoff: String, fri_cutoff: String, sat_cutoff: String, sun_cutoff: String): String = {
    " some logics"
}

看着:

import org.apache.spark.sql.functions.udf 

只提供10个参数。

因此我尝试使用Tuple来解决这个问题。

{
   val udfTuple222 = udf(tupleFn(_: Tuple2[String, String]))

    val xxx = TSCdf.withColumn("newColumnName", udfTuple222(lit(col("Application_System"), col("Transaction_Status"))))

    xxx.show()

  }

  def tupleFn(tuple: Tuple2[String, String]): Int = {

    println("this is : " + tuple._1)
    println("this is tuple 2" + tuple._2)
    1
  }

不幸的是,这也引发了如下错误,

Exception in thread "main" java.lang.RuntimeException: Unsupported literal type class scala.Tuple2 (Application_System,Transaction_Status)
    at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:50)
    at org.apache.spark.sql.functions$.lit(functions.scala:117)
    at com.scb.cnc.payments.writer.Test$.main(Test.scala:69)
    at com.scb.cnc.payments.writer.Test.main(Test.scala)

请帮忙。

0 个答案:

没有答案