Question

我使用val scalaVersionCheck = Def.task { if (scalaVersion.value.startsWith("2.10")) sys.error("wrong version") } run <<= run dependsOn scalaVersionCheck将hbase表导出为HDFS Hbase Export utility tool。

现在我想使用mapreduce作业来处理这个文件：

SequenceFile

但它总是抛出这个异常：

public class MapSequencefile {
        public static class MyMapper extends Mapper<LongWritable, Text, Text, Text>{
            @Override
            protected void map(LongWritable key, Text value,
                    Mapper<LongWritable, Text, Text, Text>.Context context)
                    throws IOException, InterruptedException {
                System.out.println(key+"...."+value);
            }
        }

        public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {  

            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf , MapSequencefile.class.getSimpleName());

            job.setJarByClass(MapSequencefile.class);
            job.setNumReduceTasks(0);
            job.setMapperClass(MyMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setMapOutputValueClass(Text.class);
            job.setInputFormatClass(SequenceFileInputFormat.class); //use SequenceFileInputFormat
            FileInputFormat.setInputPaths(job, "hdfs://192.16.31.10:8020/input/");
            FileOutputFormat.setOutputPath(job, new Path("hdfs://192.16.31.10:8020/out/"));
            job.waitForCompletion(true);
        }  
}

我该怎么做才能解决此错误？

Answer 1

我假设您正在使用它来进行导出：

$ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]]

如此HBase页面所述：http://hbase.apache.org/0.94/book/ops_mgt.html#export

查看org.apache.hadoop.hbase.mapreduce.Export的{{3}}，您可以看到它设置：

job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);

哪个与您的错误一致（值为Result对象）：

Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'

因此，您的地图签名需要更改为：

Mapper<ImmutableBytesWritable, Result, Text, Text>

您需要在项目中包含正确的HBase库，以便它可以访问：

org.apache.hadoop.hbase.client.Result

如何使用mapreduce从hbase SequenceFile中提取键值对？

1 个答案: