我有一个用java编写的Hive UDF,我试图在pyspark 2.0.0中使用它。以下是步骤 1.将jar文件复制到EMR 2.开始像下面的pyspark工作
pyspark --jars ip-udf-0.0.1-SNAPSHOT-jar-with-dependencies-latest.jar
3。使用下面的代码访问UDF
from pyspark.sql import SparkSession
from pyspark.sql import HiveContext
sc = spark.sparkContext
sqlContext = HiveContext(sc)
sqlContext.sql("create temporary function ip_map as 'com.mediaiq.hive.IPMappingUDF'")
我收到以下错误:
py4j.protocol.Py4JJavaError:调用o43.sql时发生错误。 :java.lang.NoSuchMethodError: org.apache.hadoop.hive.conf.HiveConf.getTimeVar(Lorg /阿帕奇/ hadoop的/蜂巢/ CONF / HiveConf $ ConfVars; Ljava / util的/并行/ TIMEUNIT;).J 在 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient。(RetryingMetaStoreClient.java:76) 在 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) 在 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) 在 org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98) 在 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) 在org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) 在 。org.apache.spark.sql.hive.client.HiveClientImpl(HiveClientImpl.scala:189) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 方法)at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) 在 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 在 org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:258) 在 org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:359) 在 org.apache.spark.sql.hive.HiveUtils $ .newClientForMetadata(HiveUtils.scala:263) 在 org.apache.spark.sql.hive.HiveSharedState.metadataHive $ lzycompute(HiveSharedState.scala:39) 在 org.apache.spark.sql.hive.HiveSharedState.metadataHive(HiveSharedState.scala:38) 在 org.apache.spark.sql.hive.HiveSharedState.externalCatalog $ lzycompute(HiveSharedState.scala:46) 在 org.apache.spark.sql.hive.HiveSharedState.externalCatalog(HiveSharedState.scala:45) 在 org.apache.spark.sql.hive.HiveSessionState.catalog $ lzycompute(HiveSessionState.scala:50) 在 org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48) 在 org.apache.spark.sql.hive.HiveSessionState $$匿名$ 1(HiveSessionState.scala:63)。 在 org.apache.spark.sql.hive.HiveSessionState.analyzer $ lzycompute(HiveSessionState.scala:63) 在 org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62) 在 org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) 在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:64)at at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)at at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 在 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 在java.lang.reflect.Method.invoke(Method.java:498)at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)at at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at py4j.Gateway.invoke(Gateway.java:280)at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) 在py4j.commands.CallCommand.execute(CallCommand.java:79)at py4j.GatewayConnection.run(GatewayConnection.java:214)at java.lang.Thread.run(Thread.java:745)
答案 0 :(得分:0)
您可能已使用不同版本的Hive构建了UDF。请务必在用于构建包含UDF的jar的pom.xml
中指定相同版本的Hive。例如,请参阅this previous answer。