我正在制作地图缩小版。我有两个数据集。我必须根据ID组合这两个,并为每个上下文分别计算ID的出现次数。 (例如,如果它列出了在几个州运营的旅行社的数据,我需要的输出格式为:用户ID - 纽约的访问次数,IL的访问次数)。该数据集包含字段状态:' NY'。我有一组预定义的状态(NY,IL)。
在减少它的同时,尽管存在数据,但我总是将计数设为零。 对于所有ID,我的输出为UID 0 0。 以下是我的代码:
`import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class myMap {
/* Map*/
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokens = new StringTokenizer(line, ",");
Boolean eventFlag = false;
String UID = "", state = "";
while (tokens.hasMoreTokens()) {
String currToken = tokens.nextToken();
String[] keyValue = currToken.split(":");
if (keyValue[0].equals( "state")) {
state = keyValue[1].trim();
}
if (keyValue[0].equalsIgnoreCase( "user")) {
UID = keyValue[1];
}
}
output.collect(new Text(UID), new Text(state));
}
}
/* Reducer*/
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text,Text> {
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
int nyCnt = 0;
int ilCnt = 0;
String currValue = new String();
while (values.hasNext()) {
currValue = values.next().toString();
if (currValue.equalsIgnoreCase("NY")) {
nyCnt+=1;
}
if (currValue.equalsIgnoreCase("IL")) {
ilCnt+=1;
}
output.collect(key , new Text(currValue));
}
String counts = Integer.toString(nyCnt) + " " + Integer.toString(ilCnt);
output.collect(key, new Text(counts) );
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(myMap.class);
conf.setJobName("myMap");
conf.setJarByClass(myMap.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
`
任何有关错误的帮助都会有用。谢谢。