Question

我正在研究Impala / Hive UDF示例，例如：

public class FuzzyEqualsUdf extends UDF {
    public FuzzyEqualsUdf() {
    }

    public BooleanWritable evaluate(DoubleWritable x, DoubleWritable y) {
        double EPSILON = 0.000001f;
        if (x == null || y == null)
            return null;
        return new BooleanWritable(Math.abs(x.get() - y.get()) < EPSILON);
    }
}

然后我尝试创建自己的UDF，其中String为输入，String为输出。理想情况下，它应该看起来像：

public class MyUdf extends UDF {
    public MyUdf() {
    }

    public StringWritable evaluate(StringWritable x) {
        String[] y = x.split(",");
        String z = y[0] + "|" + y[1] 
        return new StringWritable(z);
    }
}

但是，我的问题是没有StringWritable课程！我只看到：

import org.apache.hadoop.hive.serde2.io.ByteWritable;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
import org.apache.hadoop.hive.serde2.io.ShortWritable;
import org.apache.hadoop.hive.serde2.io.TimestampWritable;

如何在没有StringWritable类的情况下使用String类型输入/输出创建udf？谢谢！

Answer 1

毛豆。也许，你可以使用org.apache.hadoop.io.Text类。

你可以参考一下Hive的内置功能。我提到了Trim，它接受字符串并输出字符串

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java

Answer 2

事实证明，使用Java String类型输入/输出可以正常工作。

table c

其他Impala文档：http://impala.io/doc/html/TestUdf_8java_source.html

带字符串输入/输出的Hive / Impala UDF

2 个答案: