我是Hadoop的新手,我使用MapReduce Java Code解决了一个问题。 我有一个这样的文件,在每一行中都有一个日期和一些单词(A,B,C ..):
我实施了Map Reduce算法,每个月我都可以找到每个单词的总发生次数,为此我可以说出最大的2是什么occurrencies。 我正在考虑这种结果:
我尝试了两种不同的解决方案来寻找方法: - 第一个:
public class GiulioTest {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line,",");
String dataAttuale = tokenizer.nextToken().substring(0, line.lastIndexOf("-"));
while (tokenizer.hasMoreTokens()) {
String prod = tokenizer.nextToken(",");
word.set(dataAttuale + ":" + prod);
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterator<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
result.set(sum)
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setJarByClass(GiulioTest.class);
job.waitForCompletion(true);
}
}
我期待这段代码能够给我这样的结果:
然后找到一种方法来查找带有最大值的前两个字母。 实际上,我不知道在这一点上是否有办法重新计算密钥,以提取最大值。任何人都有建议吗?
我想到的另一个解决方案,但只是在伪代码中,我不知道是否可以使用MapReduce Framework: 定义文本键, 定义List listValues, 定义finalMap,//带有值的Map是另一个String和Integer的映射
mapper(key,value,context) {
month = //retrieve using String tokenizer splitting(',')
tmpKey = month;
while(itr.hasMoreToken()) {
listValues.add(itr.nextToken())
}
key.set(tmpKey)
context.put(key, listValues) //And here, there is my first doubt, if is it possible to set in context something like context(Text,List<String>)
}
reduce(Text key, Iterable<List<String>> values, Context context) {
Map<String,Int> letterVal = new ...;
for(List<String> listLetter : values) {
while(listLetter.haContent()) {
String letter = listLetter.next()
if(letterVal.contains(letter)) {
Int tmpVal = letterVal.get(letter)
letterVal.put(letter, tmpVal+1);
} else
letterVal.put(letter,1)
}
}
finalMap.put(key, letterVal)
context.write(finalMap.get(key), finalMap.toString)
}