如何使用java编写的hive udf,udaf,udtfs在像eclipse这样的ide中调试?

时间:2016-05-09 09:42:45

标签: hadoop hive apache-pig hive-udf

例如,对于调试pig udfs,这可行:http://ben-tech.blogspot.ie/2011/08/how-to-debug-pig-udfs-in-eclipse.html我有一个hive脚本,我使用的udaf失败,所以我想逐步执行udf代码。

2 个答案:

答案 0 :(得分:2)

JUNIT可以从eclipse IDE调试。,因为它是一个java类。

考虑这个UDF。

示例1

class SimpleHelloWorldUDFExample extends UDF {
        public Text evaluate(Text input) {
    if(input == null) return null;
    return new Text("Hello " + input.toString());
  }
}

Junit测试方法就是这样......

@Test
public void testUDFNullCheck() {
  SimpleHelloWorldUDFExample example = new SimpleHelloWorldUDFExample();
  Assert.assertNull(example.evaluate(null));
}

示例2

package com.hive.udftest

import java.util.List;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;

class HiveUDFTest extends GenericUDF {

  ListObjectInspector listOI;
  StringObjectInspector elementOI;

  @Override
  public String getDisplayString(String[] arg0) {
    return "arrayContainsExample()"; // this should probably be better
  }

  @Override
  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
    if (arguments.length != 2) {
      throw new UDFArgumentLengthException("arrayContainsExample only takes 2 arguments: List<T>, T");
    }
    // 1. Check we received the right object types.
    ObjectInspector a = arguments[0];
    ObjectInspector b = arguments[1];
    if (!(a instanceof ListObjectInspector) || !(b instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list / array, second argument must be a string");
    }
    this.listOI = (ListObjectInspector) a;
    this.elementOI = (StringObjectInspector) b;

    // 2. Check that the list contains strings
    if(!(listOI.getListElementObjectInspector() instanceof StringObjectInspector)) {
      throw new UDFArgumentException("first argument must be a list of strings");
    }

    // the return type of our function is a boolean, so we provide the correct object inspector
    return PrimitiveObjectInspectorFactory.javaBooleanObjectInspector;
  }

  @Override
  public Object evaluate(DeferredObject[] arguments) throws HiveException {

    // get the list and string from the deferred objects using the object inspectors
    List<String> list = (List<String>) this.listOI.getList(arguments[0].get());
    String arg = elementOI.getPrimitiveJavaObject(arguments[1].get());

    // check for nulls
    if (list == null || arg == null) {
      return null;
    }

    // see if our list contains the value we need
    for(String s: list) {
      if (arg.equals(s)) return new Boolean(true);
    }
    return new Boolean(false);
  }

}

Junit测试案例将是

package com.hive.udftest

import java.util.ArrayList;
import java.util.List;

import junit.framework.Assert;

import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredJavaObject;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF.DeferredObject;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.JavaBooleanObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
import org.junit.Test;

public class HiveUDFTestTest {


  @Test
  public void testComplexUDFReturnsCorrectValues() throws HiveException {

    // set up the models we need
    HiveUDFTest example = new HiveUDFTest();
    ObjectInspector stringOI = PrimitiveObjectInspectorFactory.javaStringObjectInspector;
    ObjectInspector listOI = ObjectInspectorFactory.getStandardListObjectInspector(stringOI);
    JavaBooleanObjectInspector resultInspector = (JavaBooleanObjectInspector) example.initialize(new ObjectInspector[]{listOI, stringOI});

    // create the actual UDF arguments
    List<String> list = new ArrayList<String>();
    list.add("a");
    list.add("b");
    list.add("c");

    // test our results

    // the value exists
    Object result = example.evaluate(new DeferredObject[]{new DeferredJavaObject(list), new DeferredJavaObject("a")});
    Assert.assertEquals(true, resultInspector.get(result));

    // the value doesn't exist
    Object result2 = example.evaluate(new DeferredObject[]{new DeferredJavaObject(list), new DeferredJavaObject("d")});
    Assert.assertEquals(false, resultInspector.get(result2));

    // arguments are null
    Object result3 = example.evaluate(new DeferredObject[]{new DeferredJavaObject(null), new DeferredJavaObject(null)});
    Assert.assertNull(result3);
  }
}

类似的方式UDAF,UDTF以及......

答案 1 :(得分:0)

这是一个不错的博客,带有示例测试用例。

http://www.spryinc.com/blog/making-use-aspectj-test-hive-udtfs