从HDFS读取并写入HBASE

时间:2017-02-24 14:35:59

标签: hadoop hbase elastic-map-reduce

Mapper正在从两个地方读取文件 1)用户访问的文章(按国家分类) 2)国家统计(国家明智)

Mapper的输出都是Text,Text

我正在运行 Amazon Cluster

的程序

我的目标是从两个不同的集合中读取数据并将结果合并并将其存储在hbase中。

HDFS到HDFS正在运行。 代码在降低67%时陷入困境并且出现错误

17/02/24 10:45:31 INFO mapreduce.Job:  map 0% reduce 0%
17/02/24 10:45:37 INFO mapreduce.Job:  map 100% reduce 0%
17/02/24 10:45:49 INFO mapreduce.Job:  map 100% reduce 67%
17/02/24 10:46:00 INFO mapreduce.Job: Task Id : attempt_1487926412544_0016_r_000000_0, Status : FAILED
Error: java.lang.IllegalArgumentException: Row length is 0
        at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:565)
        at org.apache.hadoop.hbase.client.Put.<init>(Put.java:110)
        at org.apache.hadoop.hbase.client.Put.<init>(Put.java:68)
        at org.apache.hadoop.hbase.client.Put.<init>(Put.java:58)
        at com.happiestminds.hadoop.CounterReducer.reduce(CounterReducer.java:45)
        at com.happiestminds.hadoop.CounterReducer.reduce(CounterReducer.java:1)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

驱动程序类是

package com.happiestminds.hadoop;



import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class Main extends Configured implements Tool {

    /**
     * @param args
     * @throws Exception
     */
    public static String outputTable = "mapreduceoutput";

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new Main(), args);
        System.exit(exitCode);
    }

    @Override
    public int run(String[] args) throws Exception {


        Configuration config = HBaseConfiguration.create();

        try{
            HBaseAdmin.checkHBaseAvailable(config);
        }
        catch(MasterNotRunningException e){
            System.out.println("Master not running");
            System.exit(1);
        }

        Job job = Job.getInstance(config, "Hbase Test");

        job.setJarByClass(Main.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);



        MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, ArticleMapper.class);
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, StatisticsMapper.class);

        TableMapReduceUtil.addDependencyJars(job);
        TableMapReduceUtil.initTableReducerJob(outputTable, CounterReducer.class, job);

        //job.setReducerClass(CounterReducer.class);

        job.setNumReduceTasks(1);


        return job.waitForCompletion(true) ? 0 : 1;
    }

}

减速机等级

package com.happiestminds.hadoop;

import java.io.IOException;

import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class CounterReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {

    public static final byte[] CF = "counter".getBytes();
    public static final byte[] COUNT = "combined".getBytes();


    @Override
    protected void reduce(Text key, Iterable<Text> values,
            Reducer<Text, Text, ImmutableBytesWritable, Mutation>.Context context)
            throws IOException, InterruptedException {

        String vals = values.toString();
        int counter = 0;

        StringBuilder sbr = new StringBuilder();
        System.out.println(key.toString());
        for (Text val : values) {
            String stat = val.toString();
            if (stat.equals("***")) {
                counter++;
            } else {
                sbr.append(stat + ",");
            }

        }
        sbr.append("Article count : " + counter);


        Put put = new Put(Bytes.toBytes(key.toString()));
        put.addColumn(CF, COUNT, Bytes.toBytes(sbr.toString()));
        if (counter != 0) {
            context.write(null, put);
        }

    }



}

依赖关系

<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>



        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.2.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-common</artifactId>
            <version>1.2.2</version>
        </dependency>


        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.2.2</version>
        </dependency>



    </dependencies>

4 个答案:

答案 0 :(得分:1)

一个好的做法是在将值提交到某处之前验证您的值。在您的特定情况下,您可以验证密钥 sbr ,或使用适当的通知政策将其包装到try-catch部分。如果它们不正确,您应该将它们输出到某个日志中,并使用新的测试用例更新单元测试:

 try
 {
    Put put = new Put(Bytes.toBytes(key.toString()));
    put.addColumn(CF, COUNT, Bytes.toBytes(sbr.toString()));
    if (counter != 0) {
        context.write(null, put);
    }
 }
 catch (IllegalArgumentException ex)
 {
      System.err.println("Error processing record - Key: "+ key.toString() +", values: " +sbr.ToString());
 }

答案 1 :(得分:1)

根据程序抛出的异常,很明显密钥长度为0所以在放入hbase之前你可以检查密钥长度是否为0,那么只有你可以放入hbase。

  

更清楚为什么hbase

不支持密钥长度&lt; 0了

因为HBase数据模型不允许0长行密钥,所以应至少为1个字节。 0字节行密钥保留供内部使用(指定空的开始键和结束键)。

答案 2 :(得分:0)

您可以尝试检查是否插入任何空值?

HBase数据模型不允许零长度行密钥,它应该至少为1个字节。

在执行put命令之前,请检查您的reducer代码,是否将某些值填充为null。

答案 3 :(得分:0)

你得到的错误是不言自明的。 HBase中的行键不能为空(尽管值可以是)。

@Override
protected void reduce(Text key, Iterable<Text> values,
        Reducer<Text, Text, ImmutableBytesWritable, Mutation>.Context context)
        throws IOException, InterruptedException {
    if (key == null || key.getLength() == 0) {
      // Log a warning about the empty key.
      return;
    }
    // Rest of your reducer follows.
}