Question

我想编写一个Hive UDF，它接受可变数量的参数（不同类型）并将其输出为JSON blob（列名称映射到列值）。

Select userId, myudf(col2, col3) from TABLE 2; // the output of udf should be {"col2":50, "col3":"Y" }

Select userId, myudf(col2, col3, col4) from TABLE 1; // the output of udf should be {"col2":"s", "col3":5, "col4":"Y"}

Select userId, myudf(col2, col3, col4, col6, col7) from TABLE 3; //the output of udf should be {"col2":"M", "col3":"A", "col4":2.5, "col6":"D", "col7":99 }

每个表都有不同的列，具有不同的类型（userId在所有表中都很常见）。我可以单独传递列名，如果有帮助：myudf（＆＃34; col2＆＃34;，col2，＆＃34; col3＆＃34;，col3）。任何想法将不胜感激。

Answer 1

您应该使用GenericUDF对象（按UDF对象的顺序）。

Mark Grover写了一篇关于http://mark.thegrovers.ca/tech-blog/how-to-write-a-hive-udf

的博客文章

这是相关的源代码：https://github.com/markgrover/hive-translate/blob/master/src/main/java/org/mgrover/hive/translate/GenericUDFTranslate.java

编写Hive UDF函数，该函数采用可变数量的args并输出JSON blob

1 个答案: