火花插入Hbase

时间:2016-04-10 13:55:14

标签: hadoop apache-spark hbase spark-streaming

我有一个像下面这样的pojo类: 我能够读取流数据,我想将数据插入Hbase

@JsonInclude(Include.NON_NULL)
public class empData implements Serializable {

    private String id;
    private String name;

    @Override
    public String toString() {
        return "id=" + id + ", name="+ name ;
    }
    public String id() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }

}

以下是火花代码:

empRecords.foreachRDD(new Function<JavaRDD<empData>, Void>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Void call(JavaRDD<empData> empDataEvent)throws Exception {       

                Configuration conf = HBaseConfiguration.create();
                Configuration config = null;
                config = HBaseConfiguration.create();
                config.set("hbase.zookeeper.quorum", "**********);
                HBaseAdmin.checkHBaseAvailable(config);
                config.set(TableInputFormat.INPUT_TABLE, "tableName");
                Job newAPIJobConfiguration1 = Job.getInstance(config);
                newAPIJobConfiguration1.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "empHbase");
                newAPIJobConfiguration1.setOutputFormatClass(org.apache.hadoop.hbase.mapreduce.TableOutputFormat.class);        
                JavaPairRDD<ImmutableBytesWritable, Put> inesrts = empData.mapToPair(new PairFunction<Row, ImmutableBytesWritable, Put>() {

                            public Tuple2<ImmutableBytesWritable, Put> call(Row row) throws Exception

                            {
                                Put put = new Put(Bytes.toBytes(row.getString(0)));
                                put.add(Bytes.toBytes("empA"),Bytes.toBytes("id"),Bytes.toBytes(row.getString(1)));
                                put.add(Bytes.toBytes("empA"),Bytes.toBytes("name"),Bytes.toBytes(row.getString(2)));
                                return new Tuple2<ImmutableBytesWritable, Put>(new ImmutableBytesWritable(), put);
                            }
                                });

                            inserts.saveAsNewAPIHadoopDataset(newAPIJobConfiguration1.getConfiguration());
                                        }
        });
        jssc.start();
        jssc.awaitTermination();
    }  

代码中的问题是这一步:

JavaPairRDD<ImmutableBytesWritable, Put> inesrts =empDataEvent.mapToPair(new PairFunction<Row, ImmutableBytesWritable, Put>()  

如何使用empDataEvent以及如何插入.. 如何插入mapToPair empDataEvent类对象,以便我可以插入到Hbase中。 任何帮助表示赞赏..

0 个答案:

没有答案