在pyspark EMR 5.x中运行用Java编写的hive UDF时出现错误

时间:2017-01-09 10:22:29

标签: apache-spark pyspark amazon-emr hive-udf

我有一个用java编写的Hive UDF,我试图在pyspark 2.0.0中使用它。以下是步骤 1.将jar文件复制到EMR 2.开始像下面的pyspark工作

pyspark --jars ip-udf-0.0.1-SNAPSHOT-jar-with-dependencies-latest.jar

3。使用下面的代码访问UDF

from pyspark.sql import SparkSession
from pyspark.sql import HiveContext
sc = spark.sparkContext
sqlContext = HiveContext(sc)
sqlContext.sql("create temporary function ip_map as 'com.mediaiq.hive.IPMappingUDF'")

我收到以下错误:

  

py4j.protocol.Py4JJavaError:调用o43.sql时发生错误。   :java.lang.NoSuchMethodError:   org.apache.hadoop.hive.conf.HiveConf.getTimeVar(Lorg /阿帕奇/ hadoop的/蜂巢/ CONF / HiveConf $ ConfVars; Ljava / util的/并行/ TIMEUNIT;).J     在   org.apache.hadoop.hive.metastore.RetryingMetaStoreClient。(RetryingMetaStoreClient.java:76)     在   org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)     在   org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)     在   org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)     在   org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)     在org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)at   org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)     在   。org.apache.spark.sql.hive.client.HiveClientImpl(HiveClientImpl.scala:189)     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native   方法)at   sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)     在   sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)     在   org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258)     在   org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:359)     在   org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:263)     在   org.apache.spark.sql.hive.HiveSharedState.metadataHive $ lzycompute(HiveSharedState.scala:39)     在   org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38)     在   org.apache.spark.sql.hive.HiveSharedState.externalCatalog $ lzycompute(HiveSharedState.scala:46)     在   org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45)     在   org.apache.spark.sql.hive.HiveSessionState.catalog $ lzycompute(HiveSessionState.scala:50)     在   org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)     在   org.apache.spark.sql.hive.HiveSessionState $$匿名$ 1(HiveSessionState.scala:63)。     在   org.apache.spark.sql.hive.HiveSessionState.analyzer $ lzycompute(HiveSessionState.scala:63)     在   org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)     在   org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)     在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:64)at at   org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)at at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)at at   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at   py4j.Gateway.invoke(Gateway.java:280)at   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)     在py4j.commands.CallCommand.execute(CallCommand.java:79)at   py4j.GatewayConnection.run(GatewayConnection.java:214)at   java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:0)

您可能已使用不同版本的Hive构建了UDF。请务必在用于构建包含UDF的jar的pom.xml中指定相同版本的Hive。例如,请参阅this previous answer