Java map reduce - count属于reduce

时间:2015-04-07 06:41:19

标签: java apache hadoop count mapreduce

我正在制作地图缩小版。我有两个数据集。我必须根据ID组合这两个,并为每个上下文分别计算ID的出现次数。 (例如,如果它列出了在几个州运营的旅行社的数据,我需要的输出格式为:用户ID - 纽约的访问次数,IL的访问次数)。该数据集包含字段状态:' NY'。我有一组预定义的状态(NY,IL)。

在减少它的同时,尽管存在数据,但我总是将计数设为零。 对于所有ID,我的输出为UID 0 0。 以下是我的代码:

`import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;


public class myMap {
    /* Map*/
     public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
         public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
             String line = value.toString();
             StringTokenizer tokens = new StringTokenizer(line, ",");
             Boolean eventFlag = false;
             String UID = "", state = "";

             while (tokens.hasMoreTokens()) {
                 String currToken = tokens.nextToken();
                 String[] keyValue = currToken.split(":");

                 if (keyValue[0].equals( "state")) {
                         state = keyValue[1].trim();

                 }
                 if (keyValue[0].equalsIgnoreCase( "user")) {
                     UID = keyValue[1];
                 }

             }

                 output.collect(new Text(UID), new Text(state));
         }
     }

     /* Reducer*/
     public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text,Text> {
        public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
            int nyCnt = 0;
            int ilCnt = 0;
            String currValue = new String();
            while (values.hasNext()) {
                currValue = values.next().toString();
                if (currValue.equalsIgnoreCase("NY")) {
                    nyCnt+=1;
                }
                if (currValue.equalsIgnoreCase("IL")) {
                    ilCnt+=1;
                }
                output.collect(key , new Text(currValue));

            }
            String counts = Integer.toString(nyCnt) + " " + Integer.toString(ilCnt);
             output.collect(key, new Text(counts) );    
        }
     }

    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(myMap.class);
        conf.setJobName("myMap");
        conf.setJarByClass(myMap.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(Text.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(Text.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));


        JobClient.runJob(conf);

    }
}


`

任何有关错误的帮助都会有用。谢谢。

0 个答案:

没有答案