使用Java方法的UDF打破了火花

时间:2017-06-19 21:06:13

标签: java scala apache-spark spark-dataframe udf

我在databricks环境中完成了这段代码,但是当我在我的本地环境中尝试它时,它会中断......

  val _event_day_of_week = (event_date_of_event: String) => {
    import java.time.LocalDate
    import java.time.format.DateTimeFormatter

    val formatter: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
    val dayOfWeek: String = LocalDate.parse(event_date_of_event.substring(0,10), formatter).getDayOfWeek.toString
    dayOfWeek
  }

  val event_day_of_weekUDF = udf(_event_day_of_week)

df.select($"uuid", event_day_of_weekUDF($"event_date_of_event") as "event_day_of_week").first

错误:

Exception in thread "main" java.lang.NullPointerException
    at com.faniak.ml.eventBuzz$.delayedEndpoint$com$faniak$ml$eventBuzz$1(eventBuzz.scala:72)
    at com.faniak.ml.eventBuzz$delayedInit$body.apply(eventBuzz.scala:17)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at com.faniak.ml.eventBuzz$.main(eventBuzz.scala:17)
    at com.faniak.ml.eventBuzz.main(eventBuzz.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

版本是Spark 2.1

1 个答案:

答案 0 :(得分:1)

UDF无法解决问题。在apache Spark上进行原型设计时,不要扩展scala类App,因为它与spark无法正常工作。

object EventBuzzDataset extends App{

为了工作,你应该是对的:

object EventBuzzDataset{

   def main(args: Array[String])

问题在这里详细说明: https://issues.apache.org/jira/browse/SPARK-4170https://github.com/apache/spark/pull/3497

Thansk给@puhlen提示!

相关问题