增加Spark中的列值

时间:2016-09-20 07:00:11

标签: apache-spark mapreduce spark-streaming mapr

我是一个Spark Streaming对象,它从RabbitMQ中获取数据并将其保存到HBase中。此保存是增量操作。我正在使用saveAsNewAPIHadoopDataset,但我一直低于例外

代码:

pairDStream.foreachRDD(new VoidFunction<JavaPairRDD<String, Integer>>() {

                @Override
                public void call(JavaPairRDD<String, Integer> arg0)
                        throws Exception {

                    Configuration dbConf = HBaseConfiguration.create();
                      dbConf.set("hbase.table.namespace.mappings", "tablename:/mapr/tablename");


                    Job jobConf = Job.getInstance(dbConf);
                    jobConf.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "tablename");
                    jobConf.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);

                    JavaPairRDD<ImmutableBytesWritable, Increment> hbasePuts =  arg0.mapToPair(
                            new PairFunction<Tuple2<String,Integer>, ImmutableBytesWritable, Increment>() {

                                @Override
                                public Tuple2<ImmutableBytesWritable, Increment> call(
                                        Tuple2<String, Integer> arg0)
                                        throws Exception {

                                    String[] keys = arg0._1.split("_");

                                    Increment inc = new Increment(Bytes.toBytes(keys[0]));
                                    inc.addColumn(Bytes.toBytes("data"), 
                                            Bytes.toBytes(keys[1]), 
                                            arg0._2);

                                    return new Tuple2<ImmutableBytesWritable, Increment>(new ImmutableBytesWritable(), inc);   
                                }
                            });

                    // save to HBase- Spark built-in API method
                    hbasePuts.saveAsNewAPIHadoopDataset(jobConf.getConfiguration());
                }

            });

例外:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most recent failure: Lost task 1.3 in stage 6.0 (TID 100, dev-arc-app036.vega.cloud.ironport.com): java.io.IOException: Pass a Delete or a Put
        at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:128)
        at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:87)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
        at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)

是否可以使用&#34; saveAsNewAPIHadoopDataset&#34;增量的方法而不是Put?
非常感谢任何帮助。

由于

Akhila。

0 个答案:

没有答案