如何将我的MapReduce作业的结果存储在Hashmap中并按值排序?

时间:2014-05-23 07:47:51

标签: java hadoop mapreduce hbase

我是HBase MapReduce工作的新手,我想计算我桌上的前10位用户。

在我的Reducer类中,我放置了一个局部hashmap来存储每个结果排序映射。

我的问题是:

如何打印出我的hashmap的内容,因为添加了一个' System.out.println'声明不起作用?

公共类MyScanner2 {

static Configuration conf; 
static long startTimestamp;
static long stopTimestamp;
static Scan myScan;
static String tableToScan = "VStable";

public static void main(String[] args) throws IOException, ParseException, InterruptedException, ClassNotFoundException {
    // TODO Auto-generated method stub

    initScanner();

    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "TOP10_users"); //TOP10_users is the name of the job
    job.setJarByClass(MyScanner2.class);
    FileOutputFormat.setOutputPath(job, new Path("hdfs://zwinf5q45:8020/user/hdfs/top10users"));
    TableMapReduceUtil.initTableMapperJob(Bytes.toBytes(tableToScan), myScan, Mapper1.class, ImmutableBytesWritable.class, IntWritable.class, job);
    TableMapReduceUtil.initTableReducerJob("stats", Reducer1.class, job);
    //System.out.println(MyReducer.getMap().toString());
    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

public static void initScanner() throws IOException, ParseException{

    conf = HBaseConfiguration.create();
    conf.set("hbase.rootdir", "hdfs://zwinf5q45:8020/apps/hbase/data");
    conf.set("hbase.zookeeper.quorum", "zwinf5q46,zwinf5q44,zwinf5q43,zwinf5q42,zwinf5q41");
    conf.set("zookeeper.znode.parent", "/hbase-unsecure");

    startTimestamp = convertToTimestamp("2014-05-21");
    stopTimestamp = convertToTimestamp("2014-05-22");;

    myScan = new Scan();
    myScan.setStartRow(Bytes.toBytes(startTimestamp));
    myScan.setStopRow(Bytes.toBytes(stopTimestamp));
    myScan.addColumn(Bytes.toBytes("infos"), Bytes.toBytes("bucketID"));
    myScan.setCaching(1000);



}

 public static long convertToTimestamp(String str_date) throws ParseException{

     SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
     java.util.Date date = sdf.parse(str_date);
     java.sql.Timestamp timestamp= new java.sql.Timestamp(date.getTime());

     return timestamp.getTime();
}

}

类Mapper1扩展了TableMapper {

private int numRecords = 0;
private static final IntWritable one = new IntWritable(1);

@Override
public void map(ImmutableBytesWritable row, Result values, Context context) throws IOException {

    // extract resource
    if (values.isEmpty()){
        System.out.println("The scanner is empty");
    }
    else{

        ImmutableBytesWritable resource = new ImmutableBytesWritable(values.getValue(Bytes.toBytes("infos"), Bytes.toBytes("bucketID")));

        try {
            context.write(resource, one);
        } catch (InterruptedException e) {
            throw new IOException(e);
        }
        numRecords++;
        if ((numRecords % 10000) == 0) {
            context.setStatus("mapper processed " + numRecords + " records so far");
        }
    }


}

}

class Reducer1扩展了TableReducer {

static HashMap<String,Integer> map = new HashMap<String,Integer>();

public void reduce(ImmutableBytesWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

    int sum = 0;

    for (IntWritable val : values) {
        sum += val.get();
    }        
    map.put(key.toString(),sum);
    System.out.println ("HashMap content" + Arrays.toString (map.values().toArray ()));

}

}

2 个答案:

答案 0 :(得分:0)

至于输出错误的原因,请尝试

System.out.println (Arrays.toString (map.values().toArray ()));

答案 1 :(得分:0)

首先,正确的方法是使用context.write()将reducer的结果输出到文件/ HBase表。

如果您想使用println()打印输出,您仍然可以使用MapReduce Web控制台(通常位于http://localhost:50030)查看它们。选择一份工作,转到已完成的减少任务 - &gt;选择一项任务 - &gt;检查&#34;任务日志&#34;。请注意,每个reducer都会在Task Logs中创建自己的stdout,因此您可能需要检查所有reducer的结果。