我正在编写一个地图函数。我有一个文本文件为:
364.2 366.6 365.2 0 0 1 10421
364.2 366.6 365.2 0 0 1 10422
我想显示第1,3栏。这是我的代码,但显示了所有行。
public static class SumMap extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text str = new Text();
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer lineIter = new StringTokenizer(value.toString(), "\\r?\\n");
while (lineIter.hasMoreTokens()) {
StringTokenizer tokenIter = new StringTokenizer(lineIter.nextToken(), "\\s+");
while (tokenIter.hasMoreTokens()) {
String v1 = tokenIter.nextToken();
String v2 = tokenIter.nextToken();
String c1 = tokenIter.nextToken();
String c2 = tokenIter.nextToken();
str.set(v1+c1);
context.write(str, one);
}
}
}
}
在此代码中,第一个应按行("\\r?\\n")
分隔,然后对于每一行,按数字或字符串或记号由("\\s+")
分隔。最后,打印v1+c1
。如何更改我的代码?
答案 0 :(得分:0)
问题在于生成的令牌数和您正在访问的令牌数。在内部while循环中,生成的令牌数将为7。但是您一次只能访问4个令牌。您要做的是同时访问所有令牌。由于只需要1和3列,因此检索它们并将它们分别存储。
public static class SumMap extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text str = new Text();
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer tokenIter = new StringTokenizer(lineIter.nextToken(), "\\s+");
while (tokenIter.hasMoreTokens()) {
String c1 = tokenIter.nextToken();
String c2 = tokenIter.nextToken();
String c3 = tokenIter.nextToken();
String c4 = tokenIter.nextToken();
String c5 = tokenIter.nextToken();
String c6 = tokenIter.nextToken();
String c7 = tokenIter.nextToken();
str.set(c1+c3);
context.write(str, one);
}
}
}
主要:
public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "sum");
job.setJarByClass(SumMR.class);
job.setMapperClass(SumMap.class);
// job.setCombinerClass(IntSumReducer.class);
// job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
TextInputFormat.addInputPath(job, new Path(args[1]));
FileOutputFormat.setOutputPath(job, new Path(args[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
这是修改后的代码。如果有问题,请告诉我!。
答案 1 :(得分:0)
如果使用TextInputFormat,则映射的键为行号,值为行内容。您不需要分割线。只需拆分每行:
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] vals = value.toString().split("\\s+");
if (vals.length == 7) {
context.write(new Text(vals[0] + vals[2]), one);
}
}