如何在Scala中编写Pig UDF

时间:2013-11-04 22:11:23

标签: java eclipse scala apache-pig

我正在尝试在Scala中编写Pig UDF(使用Eclipse)。我在java构建路径中添加了pig.jar作为库,它似乎解决了下面的2个导入:

  • import org.apache.pig.EvalFunc
  • import org.apache.pig.data.Tuple

然而我得到2个我无法解决的错误:

  1. org.apache.pig.EvalFunc [T]没有构造函数
  2. value get不是org.apache.pig.data.Tuple的成员(虽然我确定Tuple有get方法)
  3. 以下是完整代码:

    package datesUDFs
    import org.apache.pig.EvalFunc
    import org.apache.pig.data.Tuple
    class getYear extends EvalFunc {
      val extractDate = """^(\d\d\d\d)-\d\d-\d\d \d\d:\d\d:\d\d""".r
      def isDate(dtString: String): Boolean = extractDate.findFirstIn(dtString).nonEmpty
    
      override def exec(input: Tuple): Int = input.get(0) match {
        case dtString: String =>
          if (!isDate(dtString)) throw new IllegalArgumentException("Invalid date string!")
          else (for (extractDate(year) <- extractDate.findFirstIn(dtString)) yield year).head.toInt
        case _ => throw new IllegalArgumentException("Invalid function call!")
      }
    }
    

    有人可以帮我解决这个问题吗?

    提前致谢!!!

2 个答案:

答案 0 :(得分:1)

除了必须指定EvalFunc类型参数外,您的代码对我来说也很合适。

package datesUDFs
import org.apache.pig.EvalFunc
import org.apache.pig.data.Tuple
class getYear extends EvalFunc[Int] { // This is the only line I changed.
  val extractDate = """^(\d\d\d\d)-\d\d-\d\d \d\d:\d\d:\d\d""".r
  def isDate(dtString: String): Boolean = extractDate.findFirstIn(dtString).nonEmpty

  override def exec(input: Tuple): Int = input.get(0) match {
    case dtString: String =>
      if (!isDate(dtString)) throw new IllegalArgumentException("Invalid date string!")
      else (for (extractDate(year) <- extractDate.findFirstIn(dtString)) yield year).head.toInt
    case _ => throw new IllegalArgumentException("Invalid function call!")
  }
}

看看它是否有帮助,有时ScalaIDE会抱怨错误的事情。

答案 1 :(得分:0)

解决了!我将 hadoop-common-2.2.0.jar commons-logging-1.1.3.jar 添加到我的java构建路径中,问题得到了解决。