我有以下Hive自定义UDF的源代码,可以正常工作:-
package com.mycompany.strings;
import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class Testudf extends UDF
{
public Text evaluate(Text inputstring)
{
return new Text(inputstring.toString().toLowerCase());
}
}
此UDF将字符串转换为小写。我已经通过在蜂巢中针对它调用一个函数来对其进行了测试,并且它确实返回了预期的预期输出。以下配置单元查询返回小写字符串:-
select myfunction(somestringcolumn) from sometable limit 10;
我对基本代码做了如下更改:-
package com.mycompany.strings;
import com.jayway.jsonpath.DocumentContext;
import com.jayway.jsonpath.JsonPath;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class Testudf extends UDF
{
public Text evaluate(Text inputstring) // input is purposely not even used
{
String myjson="{ \"name\":\"jack\", \"city\":\"london\" }";
JsonPath.parse(myjson); // this line seems to be causing the issue
return new Text("hello"); // hard coded return on purpose
}
}
如您所见,代码非常简单。输入值没有作用,输出是纯硬编码的,以排除计算中引起的任何问题。同样,使用的json解析函数也将硬编码值用作输入。尽管如此,当我在蜂巢中对它运行一个函数时,出现以下异常:-
select myfunction(somestringcolumn) from sometable limit 10;
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public org.apache.hadoop.io.Text com.mycompany.strings.Testudf.evaluate(org.apache.hadoop.io.Text) on object com.mycompany.strings.Testudf@40d96578 of class com.mycompany.strings.Testudf with arguments {Unknown:org.apache.hadoop.io.Text} of size 1
如何解决?这可能是由于在引用包com.jayway.jsonpath的库时遇到了麻烦,因为只有当它引用JsonPath.parse时,它才会带来此麻烦,即使我甚至没有在返回值中使用JsonPath.parse的输出。返回值本身是纯硬编码的。
我正在使用Maven构建其jar。这是我的pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mycompany.strings</groupId>
<artifactId>testudf</artifactId>
<packaging>jar</packaging>
<version>1.0</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<repositories>
<repository>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
<id>hortonworks.extrepo</id>
<name>Hortonworks HDP</name>
<url>http://repo.hortonworks.com/content/repositories/releases</url>
</repository>
<repository>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
<id>hortonworks.other</id>
<name>Hortonworks Other Dependencies</name>
<url>http://repo.hortonworks.com/content/groups/public</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>com.jayway.jsonpath</groupId>
<artifactId>json-path</artifactId>
<version>2.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1000.2.6.4.0-91</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20.2</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<configuration>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Maven似乎正在生成2个文件,原始的testudf-1.0.jar(大小为4KB)以及testudf-1.0.jar(大小为100+ MB),我使用的是较小的版本,即原始文件-testudf-1.0.jar。