Question

我正在学习map-reduce工作。我已经完成了一项任务，我必须更改我的代码以接受另一个文本文件作为输入，并且输出必须显示具有最大值，最小值和平均值的年份的位置。这是我输入的一行示例： Calgary,AB,2009-01-07,604680,12694,2.5207754,0.065721168,0.025668362,0.972051954,0.037000279,0.022319018,,,0.003641149,,,0.002936745,,,0.016723641

输出应该是这样的： Calgary 2009 Average is: Max: Min:

这是我的代码，它给出了txt文件并计算了avg，min和max：

public class AverageMinMax {



public static class Map extends Mapper<LongWritable,Date,Text,Text> {


    //private static final FloatWritable rep= new  FloatWritable(1);
        public void map(LongWritable key,Text value,Context context)
        throws IOException, InterruptedException {
                context.write(new Text("Map_Output"), value);
        };
    }
      public static class Combiner extends Reducer<Text,Text,Text,Text>
      {
      public void reduce(Text key,Iterable<Text> values,Context context) throws IOException,InterruptedException
          {
             Integer NumberOfValues=0;
             double sum=0D;
             double min=0D;
             double max=0D;
             //double min=values.get(0);
              Iterator<Text> itr = values.iterator();
              //convertString=values(0);
              while(itr.hasNext())
              {
                  String TexttoString = itr.next().toString();
                  Double value = Double.parseDouble(TexttoString);
                  if(value<min)
                  {
                      min=value;
                  }
                  if(value>max)
                  {
                      max=value;
                  }
                  NumberOfValues++;
                  sum+=value;
              }
               Double average = sum/NumberOfValues;
                context.write(new Text("Combiner_output"), new Text(average + "," + NumberOfValues+","+min+","+max));
          };
      }
 public static class Reduce extends
       Reducer<Text,Text,Text,Text> {
      public void reduce(Text key, Iterable<Text> values,
        Context context) throws IOException, InterruptedException {
           Integer totalNumberOfValues= 0;
          Double sum=0.00;
          Double min=0D;
          Double max=0D;
          Iterator<Text> itr = values.iterator();
            while(itr.hasNext())
          {
              String TexttoString = itr.next().toString();
              String[] split_String = TexttoString.split(",");
              Double average = Double.parseDouble(split_String[0]);
              Integer NumberOfValues = Integer.parseInt(split_String[1]);
              Double minValue=Double.parseDouble(split_String[2]);
              Double maxValue=Double.parseDouble(split_String[3]);
              if(minValue<min)
              {
                  min=minValue;
              }
              if(maxValue>max)
              {
                  max=maxValue;
              }
              sum+=(average*NumberOfValues);
              totalNumberOfValues+=NumberOfValues;   
          } 
          Double average= sum/totalNumberOfValues;
          context.write(new Text("Average and Minimum and Max is"), new Text(average.toString()+" and "+ min.toString()+" and "+ max.toString()));
          };
     }
     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();
         Job job=new Job(conf,"AverageMinMax.class");
         job.setJarByClass(AverageMinMax.class);
         job.setJobName("MapReduceAssignment");
         //JobConf conf = new JobConf(Hadoop_map_reduce.class);

        //conf.setJobName("Hadoop_assignment");
         // Configuration conf = new Configuration();
      //Job job = new Job(conf, "maxmin");
      //job.setJarByClass(Hadoop_map_reduce.class);
     // FileSystem fs = FileSystem.get(conf);
    /*  if (fs.exists(new Path(args[1]))) {
       fs.delete(new Path(args[1]), true);
      }*/
         job.setOutputKeyClass(Text.class);
         job.setOutputValueClass(Text.class);

         //job.setNumReduceTasks(1);

         job.setMapperClass(Map.class);

        job.setReducerClass(Reduce.class);
         job.setCombinerClass(Combiner.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

     FileInputFormat.addInputPath(job, new Path(args[0]));
    //  FileOutputFormat.setOutputPath(job, new Path(args[1]));
        //FileInputFormat.addInputPath(job, new Path("/home/cloudera/Desktop/assign2"));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
      //  FileOutputFormat.setOutputPath(job, new Path(" user/cloudera/output"));
      job.waitForCompletion(true);
     }

}

所以，我的第一个问题是我不知道如何在mapper中转换日期以及如何找到2个键并在输出中显示。我的意思是如何重写这段代码！

感谢您的帮助

Answer 1

好的，好像你有多个问题。立刻想到两个：

您的映射器输出密钥为'Combiner_Output'。这不行。你想要这个钥匙，可能是城市名称。所以在你的例子中，'卡尔加里'。使用value.toString().split(',')[0]很容易（即，在value字符上拆分,后，从列表中获取第一个元素。）
您的reducer代码根本没有输出城市名称。这可以通过在您的Reducer中执行context.write(new Text(key.toString() + " Average and Minimum and Max is"), new Text(average.toString()+" and "+ min.toString()+" and "+ max.toString()));来解决，其中key是上述城市名称。

关于如何从Java中提取日期，请查看以下SO帖子：Extracting dates from string

一般来说，我建议你从Mapreduce开始，它的设计权衡，以及如何在Hadoop架构的范围内充分利用它。

Answer 2

你的问题并不完全清楚。所以，我的假设如下：

您有一组数据，显示您要处理的位置，日期和一些双重值
您要处理的值从第一个双精度值开始（即2.5207754，...）。
您的平均值是每年整个观察的所有列的平均值。（即如果您有2009年的5个样本，并且每个样本有5个值，则需要25个值的平均值。）
您的最小值和最大值是各年份整个观察的最小值和最大值。

如果假设正确，我建议您使用Prof. Jeremy Lin's custom datatypes。可能的解决方案如下：

您的密钥将是文本中的位置和年份。

String line = value.toString();
String[] tokens = line.split(",");
String[] date = tokens[2].split("-");
String year = date[0];
String location = tokens[0];

Text locationYear = new Text(location + " " + year);

您的值将是一个ArrayListOfDoublesWritable，您可以从我上面提到的repo中使用它。

ArrayListOfDoublesWritable readings = new ArrayListOfDoublesWritable()
for(int i = 5; i < tokens.length(); i++)
{
  readings.add(Double.parseDouble(tokens[i]));
}

然后您可以将Mapper输出作为Text和ArrayListOfDoublesWritable发出。
```
context.write(locationYear, readings);
```

从这里开始，您可以通过使用Array List的Collections方法，使用计算（average，min，max）来操作reducers中的mapper输出。

我希望这会有所帮助。

Mapreduce有2个键

2 个答案: