MapReduce用于加载TSV

时间:2017-07-28 13:02:02

标签: mapreduce hbase google-cloud-bigtable

我正在尝试使用mapreduce作业将数据加载到Cloud Bigtable中,以便将第一列的值设置为hbase row key。这是我的代码和示例TSV文件。

1011    v1  v2  v3  v4
1012    c1  c2  c3  c4
1013    k1  k2  k3  k4
1014    s1  s2  s3  s4
1015    r1  r3  r2  r4
1016    p1  p2  p7  p9

这是我的代码示例: -

    public static class TokenizerMapper extends
            Mapper<Text, Text, Text,Text> {
        @Override
        public void map(Text key, Text value, Context context) throws IOException,
                InterruptedException {

            String fields[] = null;
            CSVParser csvParser = new CSVParser('\t');
            fields = csvParser.parseLine(value.toString());
            LOG.info(fields[0]);
            context.write(new Text(fields[0]), value);
        }
    }

    public static class MyTableReducer extends
            TableReducer<Text, Text, Text> {
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            String[] fields = null;
            CSVParser csvParser = new CSVParser('\t');
            try {
                for(Text value: values) {
                    fields = csvParser.parseLine(value.toString());
                    for (int i = 1; i < fields.length; ++i) {

                        Put put = new Put(Bytes.toBytes(fields[0]));
                        put.addColumn(COLUMN_FAMILY, Bytes.toBytes(cols[i]), Bytes.toBytes(fields[i]));
                        context.write(key, put);

                    }
                }
            } catch (Exception ex) {
                context.getCounter("HBaseKVMapper", "PARSE_ERRORS").increment(1);
                return;
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = HBaseConfiguration.create();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "\t");
        Job job = Job.getInstance(conf, "BigTableLoader");
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        KeyValueTextInputFormat.addInputPath(job,new Path(args[0]));

        TableName tableName = TableName.valueOf(args[1]);
        job.setJarByClass(BigtableLoader.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setMapOutputValueClass(Text.class);
        job.setMapOutputKeyClass(Text.class);
        TableMapReduceUtil.initTableReducerJob(tableName.getNameAsString(), MyTableReducer.class, job);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

我得到的问题是,不是第一列,而是将第二列设置为表中的键,并且第一列也被完全忽略,即它不存在于表中。 有没有我错过的东西?

0 个答案:

没有答案