如何使用Zeppelin / HDP 2.4中的Spark解释器使用JsonSerDe在Hive表上运行查询?

时间:2016-12-25 05:42:09

标签: apache-spark hive hortonworks-data-platform apache-zeppelin

无法在Zeppelin中使用spark Interpreter运行配置单元查询。

我可以加载hive上下文。 Hive表正在使用org.apache.hive.hcatalog.data.JsonSerDe。但查询始终返回:classNotFoundException org.apache.hive.hcatalog.data.JsonSerDe

我在火花配置中添加了spark.executor.extraClassPathspark.driver.extraClassPath

我尝试在/interpreter/spark/dep文件夹中复制jar,但没有运气。

我也在Zeppelin Interpreter配置中添加了这个。我被卡住了,请帮助我。

栈跟踪:::

java.lang.RuntimeException:java.lang.ClassNotFoundException:org.apache.hive.hcatalog.data.JsonSerDe     在org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:74)     在org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:90)     在org.apache.spark.sql.hive.execution.HiveTableScan。(HiveTableScan.scala:73)     在org.apache.spark.sql.hive.HiveStrategies $ HiveTableScans $$ anonfun $ 3.apply(HiveStrategies.scala:77)     在org.apache.spark.sql.hive.HiveStrategies $ HiveTableScans $$ anonfun $ 3.apply(HiveStrategies.scala:77)     在org.apache.spark.sql.execution.SparkPlanner.pruneFilterProject(SparkPlanner.scala:79)     在org.apache.spark.sql.hive.HiveStrategies $ HiveTableScans $ .apply(HiveStrategies.scala:73)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)     在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)     在org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)     在org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)     在org.apache.spark.sql.execution.SparkStrategies $ Aggregation $ .apply(SparkStrategies.scala:217)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)     在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)     在org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)     在org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)     在org.apache.spark.sql.execution.SparkStrategies $ BasicOperators $ .apply(SparkStrategies.scala:349)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)     在org.apache.spark.sql.catalyst.planning.QueryPlanner $$ anonfun $ 1.apply(QueryPlanner.scala:58)     在scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)     在org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)     在org.apache.spark.sql.execution.QueryExecution.sparkPlan $ lzycompute(QueryExecution.scala:47)     在org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)     在org.apache.spark.sql.execution.QueryExecution.executedPlan $ lzycompute(QueryExecution.scala:52)     在org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)     在org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134)     在org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413)     在org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495)     在org.apache.spark.sql.DataFrame.showString(DataFrame.scala:171)     在org.apache.spark.sql.DataFrame.show(DataFrame.scala:394)     在org.apache.spark.sql.DataFrame.show(DataFrame.scala:355)     在org.apache.spark.sql.DataFrame.show(DataFrame.scala:363)     at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:32)     at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:37)     at $ iwC $$ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:39)     $ iwC $$ iwC $$ iwC $$ iwC $$ iwC。(:41)     $ iwC $$ iwC $$ iwC $$ iwC。(:43)     $ iwC $ iwC $$ iwC。(:45)     $ iwC $$ iwC。(:47)     在$ iwC。(:49)     在(:51)     at。(:55)     在 。()     7岁时)     在 。()     at $ print()     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     at java.lang.reflect.Method.invoke(Method.java:606)     在org.apache.spark.repl.SparkIMain $ ReadEvalPrint.call(SparkIMain.scala:1065)     在org.apache.spark.repl.SparkIMain $ Request.loadAndRun(SparkIMain.scala:1346)     在org.apache.spark.repl.SparkIMain.loadAndRunReq $ 1(SparkIMain.scala:840)     在org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)     在org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)     在org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:709)     在org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:673)     在org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:666)     at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)     at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)     at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer $ InterpretJob.jobRun(RemoteInterpreterServer.java:295)     在org.apache.zeppelin.scheduler.Job.run(Job.java:171)     在org.apache.zeppelin.scheduler.FIFOScheduler $ 1.run(FIFOScheduler.java:139)     at java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:471)     在java.util.concurrent.FutureTask.run(FutureTask.java:262)     at java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.access $ 201(ScheduledThreadPoolExecutor.java:178)     at java.util.concurrent.ScheduledThreadPoolExecutor $ ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)     在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:org.apache.hive.hcatalog.data.JsonSerDe     在java.net.URLClassLoader $ 1.run(URLClassLoader.java:366)     在java.net.URLClassLoader $ 1.run(URLClassLoader.java:355)     at java.security.AccessController.doPrivileged(Native Method)     在java.net.URLClassLoader.findClass(URLClassLoader.java:354)     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)     at java.lang.Class.forName0(Native Method)     at java.lang.Class.forName(Class.java:278)     在org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:71)

1 个答案:

答案 0 :(得分:0)

您需要将hive/lib/hive-hcatalog-core-xxx.jar复制到spark/jars/