使用api newAPIHadoopFile从pyspark访问ORC文件时出错,spark 1.2

时间:2016-07-31 20:14:36

标签: pyspark orc

你能告诉我如何解决java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.io.orc.OrcStruct。< init>()

用于启动pyspark的命令

pyspark --jars“hive-exec-0.13.1-cdh5.3.3.jar,hadoop-common-2.5.0-cdh5.3.3.jar,hadoop-mapreduce-client-app-2.5.0-cdh5。 3.3.jar,Hadoop的MapReduce的客户端 - 共2.5.0-cdh5.3.3.jar,Hadoop的MapReduce的客户端 - 芯2.5.0-cdh5.3.3.jar,Hadoop的芯2.5.0-mr1- cdh5.3.3.jar,蜂房metastore-0.13.1-cdh5.3.3.jar“

在pyspark shell中执行以下命令

distFile = sc.newAPIHadoopFile(path =“orcdatafolder /”,inputFormatClass =“org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat”,keyClass =“org.apache.hadoop.io.NullWritable”, valueClass = “org.apache.hadoop.hive.ql.io.orc.OrcStruct”)

错误:

16/07/31 19:49:53 WARN scheduler.TaskSetManager:阶段0.0中失去的任务0.0(TID 0,sj1dra096.corp.adobe.com):java.lang.RuntimeException:java.lang.NoSuchMethodException:org .apache.hadoop.hive.ql.io.orc.OrcStruct<初始化>()     at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)     在org.apache.hadoop.io.WritableUtils.clone(WritableUtils.java:217)     在org.apache.spark.api.python.WritableToJavaConverter.org $ apache $ spark $ api $ python $ WritableToJavaConverter $$ convertWritable(PythonHadoopUtil.scala:96)     在org.apache.spark.api.python.WritableToJavaConverter.convert(PythonHadoopUtil.scala:104)     在org.apache.spark.api.python.PythonHadoopUtil $$ anonfun $ convertRDD $ 1.apply(PythonHadoopUtil.scala:183)     在org.apache.spark.api.python.PythonHadoopUtil $$ anonfun $ convertRDD $ 1.apply(PythonHadoopUtil.scala:183)     在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)     在scala.collection.Iterator $$ anon $ 10.next(Iterator.scala:312)     在scala.collection.Iterator $ class.foreach(Iterator.scala:727)     在scala.collection.AbstractIterator.foreach(Iterator.scala:1157)     在scala.collection.generic.Growable $ class。$ plus $ plus $ eq(Growable.scala:48)     在scala.collection.mutable.ArrayBuffer。$ plus $ plus $ eq(ArrayBuffer.scala:103)     在scala.collection.mutable.ArrayBuffer。$ plus $ plus $ eq(ArrayBuffer.scala:47)     在scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)     在scala.collection.AbstractIterator.to(Iterator.scala:1157)     在scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)     在scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)     在scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)     在scala.collection.AbstractIterator.toArray(Iterator.scala:1157)     在org.apache.spark.rdd.RDD $$ anonfun $ 26.apply(RDD.scala:1081)     在org.apache.spark.rdd.RDD $$ anonfun $ 26.apply(RDD.scala:1081)     在org.apache.spark.SparkContext $$ anonfun $ runJob $ 4.apply(SparkContext.scala:1319)     在org.apache.spark.SparkContext $$ anonfun $ runJob $ 4.apply(SparkContext.scala:1319)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)     在org.apache.spark.scheduler.Task.run(Task.scala:56)     在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:196)     在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)     at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:615)     在java.lang.Thread.run(Thread.java:745) 引起:java.lang.NoSuchMethodException:org.apache.hadoop.hive.ql.io.orc.OrcStruct。()     at java.lang.Class.getConstructor0(Class.java:2849)     at java.lang.Class.getDeclaredConstructor(Class.java:2053)     在org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:125)     ......还有28个

0 个答案:

没有答案