我正在尝试使用mapreduce作业将数据加载到Cloud Bigtable中,以便将第一列的值设置为hbase row key。这是我的代码和示例TSV文件。
1011 v1 v2 v3 v4
1012 c1 c2 c3 c4
1013 k1 k2 k3 k4
1014 s1 s2 s3 s4
1015 r1 r3 r2 r4
1016 p1 p2 p7 p9
这是我的代码示例: -
public static class TokenizerMapper extends
Mapper<Text, Text, Text,Text> {
@Override
public void map(Text key, Text value, Context context) throws IOException,
InterruptedException {
String fields[] = null;
CSVParser csvParser = new CSVParser('\t');
fields = csvParser.parseLine(value.toString());
LOG.info(fields[0]);
context.write(new Text(fields[0]), value);
}
}
public static class MyTableReducer extends
TableReducer<Text, Text, Text> {
@Override
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String[] fields = null;
CSVParser csvParser = new CSVParser('\t');
try {
for(Text value: values) {
fields = csvParser.parseLine(value.toString());
for (int i = 1; i < fields.length; ++i) {
Put put = new Put(Bytes.toBytes(fields[0]));
put.addColumn(COLUMN_FAMILY, Bytes.toBytes(cols[i]), Bytes.toBytes(fields[i]));
context.write(key, put);
}
}
} catch (Exception ex) {
context.getCounter("HBaseKVMapper", "PARSE_ERRORS").increment(1);
return;
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", "\t");
Job job = Job.getInstance(conf, "BigTableLoader");
job.setInputFormatClass(KeyValueTextInputFormat.class);
KeyValueTextInputFormat.addInputPath(job,new Path(args[0]));
TableName tableName = TableName.valueOf(args[1]);
job.setJarByClass(BigtableLoader.class);
job.setMapperClass(TokenizerMapper.class);
job.setMapOutputValueClass(Text.class);
job.setMapOutputKeyClass(Text.class);
TableMapReduceUtil.initTableReducerJob(tableName.getNameAsString(), MyTableReducer.class, job);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
我得到的问题是,不是第一列,而是将第二列设置为表中的键,并且第一列也被完全忽略,即它不存在于表中。 有没有我错过的东西?