我正在学习map-reduce工作。我已经完成了一项任务,我必须更改我的代码以接受另一个文本文件作为输入,并且输出必须显示具有最大值,最小值和平均值的年份的位置。这是我输入的一行示例:
Calgary,AB,2009-01-07,604680,12694,2.5207754,0.065721168,0.025668362,0.972051954,0.037000279,0.022319018,,,0.003641149,,,0.002936745,,,0.016723641
输出应该是这样的:
Calgary 2009 Average is: Max: Min:
这是我的代码,它给出了txt文件并计算了avg,min和max:
public class AverageMinMax {
public static class Map extends Mapper<LongWritable,Date,Text,Text> {
//private static final FloatWritable rep= new FloatWritable(1);
public void map(LongWritable key,Text value,Context context)
throws IOException, InterruptedException {
context.write(new Text("Map_Output"), value);
};
}
public static class Combiner extends Reducer<Text,Text,Text,Text>
{
public void reduce(Text key,Iterable<Text> values,Context context) throws IOException,InterruptedException
{
Integer NumberOfValues=0;
double sum=0D;
double min=0D;
double max=0D;
//double min=values.get(0);
Iterator<Text> itr = values.iterator();
//convertString=values(0);
while(itr.hasNext())
{
String TexttoString = itr.next().toString();
Double value = Double.parseDouble(TexttoString);
if(value<min)
{
min=value;
}
if(value>max)
{
max=value;
}
NumberOfValues++;
sum+=value;
}
Double average = sum/NumberOfValues;
context.write(new Text("Combiner_output"), new Text(average + "," + NumberOfValues+","+min+","+max));
};
}
public static class Reduce extends
Reducer<Text,Text,Text,Text> {
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
Integer totalNumberOfValues= 0;
Double sum=0.00;
Double min=0D;
Double max=0D;
Iterator<Text> itr = values.iterator();
while(itr.hasNext())
{
String TexttoString = itr.next().toString();
String[] split_String = TexttoString.split(",");
Double average = Double.parseDouble(split_String[0]);
Integer NumberOfValues = Integer.parseInt(split_String[1]);
Double minValue=Double.parseDouble(split_String[2]);
Double maxValue=Double.parseDouble(split_String[3]);
if(minValue<min)
{
min=minValue;
}
if(maxValue>max)
{
max=maxValue;
}
sum+=(average*NumberOfValues);
totalNumberOfValues+=NumberOfValues;
}
Double average= sum/totalNumberOfValues;
context.write(new Text("Average and Minimum and Max is"), new Text(average.toString()+" and "+ min.toString()+" and "+ max.toString()));
};
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job=new Job(conf,"AverageMinMax.class");
job.setJarByClass(AverageMinMax.class);
job.setJobName("MapReduceAssignment");
//JobConf conf = new JobConf(Hadoop_map_reduce.class);
//conf.setJobName("Hadoop_assignment");
// Configuration conf = new Configuration();
//Job job = new Job(conf, "maxmin");
//job.setJarByClass(Hadoop_map_reduce.class);
// FileSystem fs = FileSystem.get(conf);
/* if (fs.exists(new Path(args[1]))) {
fs.delete(new Path(args[1]), true);
}*/
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
//job.setNumReduceTasks(1);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setCombinerClass(Combiner.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
// FileOutputFormat.setOutputPath(job, new Path(args[1]));
//FileInputFormat.addInputPath(job, new Path("/home/cloudera/Desktop/assign2"));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// FileOutputFormat.setOutputPath(job, new Path(" user/cloudera/output"));
job.waitForCompletion(true);
}
}
所以,我的第一个问题是我不知道如何在mapper中转换日期以及如何找到2个键并在输出中显示。我的意思是如何重写这段代码!
感谢您的帮助
答案 0 :(得分:0)
好的,好像你有多个问题。立刻想到两个:
'Combiner_Output'
。这不行。你想要这个钥匙,可能是城市名称。所以在你的例子中,'卡尔加里'。使用value.toString().split(',')[0]
很容易(即,在value
字符上拆分,
后,从列表中获取第一个元素。)context.write(new Text(key.toString() + " Average and Minimum and Max is"), new Text(average.toString()+" and "+ min.toString()+" and "+ max.toString()));
来解决,其中key
是上述城市名称。关于如何从Java中提取日期,请查看以下SO帖子:Extracting dates from string
一般来说,我建议你从Mapreduce开始,它的设计权衡,以及如何在Hadoop架构的范围内充分利用它。
答案 1 :(得分:0)
你的问题并不完全清楚。所以,我的假设如下:
如果假设正确,我建议您使用Prof. Jeremy Lin's custom datatypes。可能的解决方案如下:
您的密钥将是文本中的位置和年份。
String line = value.toString();
String[] tokens = line.split(",");
String[] date = tokens[2].split("-");
String year = date[0];
String location = tokens[0];
Text locationYear = new Text(location + " " + year);
您的值将是一个ArrayListOfDoublesWritable,您可以从我上面提到的repo中使用它。
ArrayListOfDoublesWritable readings = new ArrayListOfDoublesWritable()
for(int i = 5; i < tokens.length(); i++)
{
readings.add(Double.parseDouble(tokens[i]));
}
然后您可以将Mapper输出作为Text和ArrayListOfDoublesWritable发出。
context.write(locationYear, readings);
从这里开始,您可以通过使用Array List的Collections方法,使用计算(average,min,max)来操作reducers中的mapper输出。
我希望这会有所帮助。