我是一个Spark Streaming对象,它从RabbitMQ中获取数据并将其保存到HBase中。此保存是增量操作。我正在使用saveAsNewAPIHadoopDataset,但我一直低于例外
代码:
pairDStream.foreachRDD(new VoidFunction<JavaPairRDD<String, Integer>>() {
@Override
public void call(JavaPairRDD<String, Integer> arg0)
throws Exception {
Configuration dbConf = HBaseConfiguration.create();
dbConf.set("hbase.table.namespace.mappings", "tablename:/mapr/tablename");
Job jobConf = Job.getInstance(dbConf);
jobConf.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "tablename");
jobConf.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);
JavaPairRDD<ImmutableBytesWritable, Increment> hbasePuts = arg0.mapToPair(
new PairFunction<Tuple2<String,Integer>, ImmutableBytesWritable, Increment>() {
@Override
public Tuple2<ImmutableBytesWritable, Increment> call(
Tuple2<String, Integer> arg0)
throws Exception {
String[] keys = arg0._1.split("_");
Increment inc = new Increment(Bytes.toBytes(keys[0]));
inc.addColumn(Bytes.toBytes("data"),
Bytes.toBytes(keys[1]),
arg0._2);
return new Tuple2<ImmutableBytesWritable, Increment>(new ImmutableBytesWritable(), inc);
}
});
// save to HBase- Spark built-in API method
hbasePuts.saveAsNewAPIHadoopDataset(jobConf.getConfiguration());
}
});
例外:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 4 times, most recent failure: Lost task 1.3 in stage 6.0 (TID 100, dev-arc-app036.vega.cloud.ironport.com): java.io.IOException: Pass a Delete or a Put
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:128)
at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:87)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1250)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
是否可以使用&#34; saveAsNewAPIHadoopDataset&#34;增量的方法而不是Put?
非常感谢任何帮助。
由于
Akhila。