Hadoop mapreduce实践

时间:2015-10-07 09:16:01

标签: java hadoop mapreduce

  

输入数据文件:

     

姓名,月份,类别,支出

hitesh,1,A1,10020  
hitesh,2,A2,10300  
hitesh,3,A3,10400  
hitesh,4,A4,11000  
hitesh,5,A1,21000  
hitesh,6,A2,5000  
hitesh,7,A3,9000  
hitesh,8,A4,1000  
hitesh,9,A1,111000    
hitesh,10,A2,12000  
hitesh,11,A3,71000  
hitesh,12,A4,177000    
kuwar,1,A1,10700  
kuwar,2,A2,17000  
kuwar,3,A3,10070  
kuwar,4,A4,10007   

人员总支出和计算所花费的独特类别。 (输出需要看起来像:名称,总支出,独特类别的总数)

我试过的......我的代码

人 - 明智总支出

public class Emp   
    {   
     public static class MyMap extends Mapper<LongWritable,Text,Text,IntWritable>   
     {
      public void map(LongWritable k,Text v, Context con)
      throws IOException, InterruptedException
      {
       String line = v.toString();
       String[] w=line.split(",");
       String person=w[0];
       int exp=Integer.parseInt(w[3]);
       con.write(new Text(person), new IntWritable(exp));
      }
     }
     public static class MyRed extends Reducer<Text,IntWritable,Text,IntWritable>
     {
      public void reduce(Text k, Iterable<IntWritable> vlist, Context con)
      throws IOException , InterruptedException
      {
       int tot =0;
       for(IntWrit

able v:vlist)
    tot+=v.get();
   con.write(k,new IntWritable(tot));
  }
 }
 public static void main(String[] args) throws Exception
 {
  Configuration c = new Configuration();
  Job j= new Job(c,"person-wise");
  j.setJarByClass(Emp.class);
  j.setMapperClass(MyMap.class);
  j.setReducerClass(MyRed.class);
  j.setOutputKeyClass(Text.class);
  j.setOutputValueClass(IntWritable.class);
  Path p1 = new Path(args[0]);
  Path p2 = new Path(args[1]);
     FileInputFormat.addInputPath(j,p1);
     FileOutputFormat.setOutputPath(j,p2);
     System.exit(j.waitForCompletion(true) ? 0:1);
 }

}

如何获得此计划中唯一类别的总数以及如何使输出看起来像名称,总支出,唯一类别的总数。???

由于

2 个答案:

答案 0 :(得分:0)

您可以为IntWritabe创建自定义可写pair,为支出创建文本1,为类别创建其他文本,并将其用作Map值。否则,将支出和类别与单个字符串中的某个分隔符一起传递,并将其拆分为减速器端。

一旦你获得了相同的for循环总和支出和类别将所有类别放入同一for循环中的Java Set,然后使用set.size()获取唯一类别的数量并打印在context.write中。再次打印缩小边值时,您可以使用与传递贴图值相同的技术。

在Mapper中,使用字符串构建器附加类别和支出,并将其作为地图值传递。

StringBuilder sb = new StringBuilder();
String sep=":";
sb.append(w[2]);
sb.append(sep);
sb.append(w[3]);

con.write(new Text(person), new Text(sb.toString()));

在缩小方面,将值与地图侧使用的值分开,并总结支出并计算使用类别创建的集合的大小。未对代码进行测试,如果在下面的代码中遗漏了变量,则抛出变量。

public void reduce(Text k, Iterable<Text> vlist, Context con)
      throws IOException , InterruptedException
      {
       int tot =0;
       String myval;
       Strng[] split_val;
       Set<String> myset=new HashSet<String>();
       int uniq_category;
       StringBuilder sb1 = new StringBuilder();
       for(Text v:vlist)
       {
       myval=v.toString();
       split_val=myval.split(":");
       myset.add(split_val[0]);
        tot+=Integer.ParseInt(split_val[1]);
        }
        uniq_category=myset.size();
        String sep="    ";
    sb1.append(uniq_category);
    sb1.append(sep);
    sb1.append(tot);
   con.write(k,new Text(sb1.toString()));
  }
 }

如上所述,或者为IntWritable和Text创建一个pair用于map和reduce值。

答案 1 :(得分:0)

在您的代码中完成了修改。希望这很有用。

 public class Emp   
        {   
         public static class MyMap extends Mapper<LongWritable,Text,Text,Text>   
         {
          public void map(LongWritable k,Text v, Context con)
          throws IOException, InterruptedException
          {
           String line = v.toString();
           String[] w=line.split(",");
           String person=w[0];
           int exp=Integer.parseInt(w[3]);
           con.write(new Text(person), new Text(line));
          }
         }
         public static class MyRed extends Reducer<Text,Text,Text,Text>
         {
          public void reduce(Text k, Iterable<Text> vlist, Context con)
          throws IOException , InterruptedException
          {
           int tot =0;
           Set<String> cat = new HashSet<String>();
           for(Text v:vlist){
               String data = v.toString();
               String[] dataArray = data.Split(",");
               tot+ = Integer.parseInt((dataArray[3]); //calculating the total spend
               cat.add(dataArray[2]);// finding the number of unique categories

      }
          con.write(k,new Text(tot.toString()+","+cat.size().toString()));// writing the name,total spend and total unique categories to the output
     }
     public static void main(String[] args) throws Exception
     {
      Configuration c = new Configuration();
      Job j= new Job(c,"person-wise");
      j.setJarByClass(Emp.class);
      j.setMapperClass(MyMap.class);
      j.setReducerClass(MyRed.class);
      j.setOutputKeyClass(Text.class);
      j.setOutputValueClass(IntWritable.class);
      Path p1 = new Path(args[0]);
      Path p2 = new Path(args[1]);
         FileInputFormat.addInputPath(j,p1);
         FileOutputFormat.setOutputPath(j,p2);
         System.exit(j.waitForCompletion(true) ? 0:1);
     }

    }