无法通过Spark

时间:2018-05-04 00:43:06

标签: apache-spark amazon-s3 hbase amazon-emr

我编写了一个简单的程序来读取HBase中的数据,该程序可以在HDFS支持的Cloudera中找到。

但在使用S3测试EMR数据时遇到异常。

// Spark conf
        SparkConf sparkConf = new SparkConf().setMaster("local[4]").setAppName("My App");
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);
        // Hbase conf
        Configuration conf = HBaseConfiguration.create();
        conf.set("hbase.zookeeper.quorum","localhost");
        conf.set("hbase.zookeeper.property.client.port","2181");
        // Submit scan into hbase conf
 //       conf.set(TableInputFormat.SCAN, TableMapReduceUtil.convertScanToString(scan));

        conf.set(TableInputFormat.INPUT_TABLE, "mytable");
        conf.set(TableInputFormat.SCAN_ROW_START, "startrow");
        conf.set(TableInputFormat.SCAN_ROW_STOP, "endrow");

        // Get RDD
        JavaPairRDD<ImmutableBytesWritable, Result> source = jsc
                .newAPIHadoopRDD(conf, TableInputFormat.class,
                        ImmutableBytesWritable.class, Result.class);

        // Process RDD
        System.out.println("&&&&&&&&&&&&&&&&&&&&&&& " + source.count());
  

18/05/04 00:22:02 INFO MetricRegistries:已加载的MetricRegistries类   org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl       18/05/04 00:22:02错误TableInputFormat:java.io.IOException:java.lang.reflect.InvocationTargetException         在org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)       引起:java.lang.reflect.InvocationTargetException         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)         在org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)       引起:java.lang.IllegalAccessError:试图从类访问类org.apache.hadoop.metrics2.lib.MetricsInfoImpl   org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry         在org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newGauge(DynamicMetricsRegistry.java:139)         在org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl。(MetricsZooKeeperSourceImpl.java:59)         在org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSourceImpl。(MetricsZooKeeperSourceImpl.java:51)         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)         在java.lang.Class.newInstance(Class.java:442)         at java.util.ServiceLoader $ LazyIterator.nextService(ServiceLoader.java:380)         ......还有42个

Exception in thread "main" java.io.IOException: Cannot create a record reader because of a previous error. Please look at the previous
     

记录任务完整日志中的行以获取更多详细信息。         at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:270)         at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)         在org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)         在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:252)         在org.apache.spark.rdd.RDD $$ anonfun $ partitions $ 2.apply(RDD.scala:250)         在scala.Option.getOrElse(Option.scala:121)         在org.apache.spark.rdd.RDD.partitions(RDD.scala:250)         在org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)         在org.apache.spark.rdd.RDD.count(RDD.scala:1158)         在org.apache.spark.api.java.JavaRDDLike $ class.count(JavaRDDLike.scala:455)         在org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)         在HbaseScan.main(HbaseScan.java:60)         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at java.lang.reflect.Method.invoke(Method.java:498)         在org.apache.spark.deploy.SparkSubmit $ .org $ apache $ spark $ deploy $ SparkSubmit $$ runMain(SparkSubmit.scala:775)         在org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:180)         在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)         在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:119)         在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)       引发者:java.lang.IllegalStateException:输入格式实例尚未正确初始化。确保你打电话   在构造函数或初始化方法中初始化         at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getTable(TableInputFormatBase.java:652)         at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:265)         ... 20多个所有APACHE HBASE LIBS:e.hadoop.hbase.metrics.impl.MetricRegistriesImpl 18/05/04 04:05:54   错误TableInputFormat:java.io.IOException:   java.lang.reflect.InvocationTargetException at   org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)     在   org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218)     在   org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)     在   org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:202)     在   org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:259)     在   org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:256)     在   org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:252)     在   org.apache.spark.rdd.RDD $$ anonfun $分区$ 2.适用(RDD.scala:250)     在scala.Option.getOrElse(Option.scala:121)at   org.apache.spark.rdd.RDD.partitions(RDD.scala:250)at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)at at   org.apache.spark.rdd.RDD.count(RDD.scala:1158)at   org.apache.spark.api.java.JavaRDDLike $ class.count(JavaRDDLike.scala:455)     在   org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)     在HbaseScan.main(HbaseScan.java:60)at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   org.apache.spark.deploy.SparkSubmit $ .ORG $阿帕奇$火花$部署$ SparkSubmit $$ runMain(SparkSubmit.scala:775)     在   org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:180)     在org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)     在org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:119)     在org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)导致   by:java.lang.reflect.InvocationTargetException at   sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)     在   sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)     在   sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)     at java.lang.reflect.Constructor.newInstance(Constructor.java:423)     在   org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)     ... 24更多引起:java.lang.RuntimeException:无法创建   接口org.apache.hadoop.hbase.zookeeper.MetricsZooKeeperSource是   类路径上的hadoop兼容性jar?在   org.apache.hadoop.hbase.CompatibilitySingletonFactory.getInstance(CompatibilitySingletonFactory.java:75)     在   org.apache.hadoop.hbase.zookeeper.MetricsZooKeeper。(MetricsZooKeeper.java:38)     在   org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper。(RecoverableZooKeeper.java:130)     在org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:143)     在   org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher。(ZooKeeperWatcher.java:181)     在   org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher。(ZooKeeperWatcher.java:155)     在   org.apache.hadoop.hbase.client.ZooKeeperKeepAliveConnection。(ZooKeeperKeepAliveConnection.java:43)     在   org.apache.hadoop.hbase.client.ConnectionManager $ HConnectionImplementation.getKeepAliveZooKeeperWatcher(ConnectionManager.java:1737)

0 个答案:

没有答案