(Java)map类在split()输入文本后无法识别第二部分文本

时间:2016-11-20 21:42:39

标签: java dictionary split reduce

我正在尝试使用cite_95.txt文件获取引用的专利文本。 这是我的MapClass代码:

public static class MapClass
   extends Mapper<LongWritable, Text, Text, IntWritable>{

//private final static IntWritable one = new IntWritable(1);

private Text cited = new Text();
private Text citing = new Text();
private IntWritable count  = new IntWritable(1);
@Override
public void map(LongWritable key, Text value, Context context
                ) throws IOException, InterruptedException {
  String[] itr = value.toString().split(",");
citing.set(itr[0]);
  cited.set(itr[1]);
context.write(cited,count);}}

当我运行这个工作时,它抛出了错误:

  

错误:java.lang.ArrayIndexOutOfBoundsException:1   为什么它不能得到它的[1]文本?

感谢。

编辑:

嗨,这是我的整个代码示例:

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class PatentCount1 {

public static class MapClass
   extends Mapper<LongWritable, Text, Text, IntWritable>{

private Text cited = new Text();
private Text citing = new Text();
private IntWritable count  = new IntWritable(1);
@Override
public void map(LongWritable key, Text value, Context context
                ) throws IOException, InterruptedException {
  String[] itr = value.toString().split(",");

 cited.set(itr[1]);
 context.write(cited,count);
 }
 }
 public static class IntSumReducer
   extends Reducer<Text,IntWritable,Text,IntWritable> {
 private IntWritable result = new IntWritable();
 @Override
 public void reduce(Text key, Iterable<IntWritable> values,
                   Context context
                   ) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable val: values) {
    sum += val.get();
  }
  result.set(sum);
  context.write(key, result);
 }
 }

public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(PatentCount1.class);
job.setMapperClass(MapClass.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
 }
 }

当我试图获得第一部分文本时,它可以工作:

 cited.set(itr[0]);
 context.write(cited,count);

编辑:

输入文件如下:

  • “理由”, “引”
  • 4658385,37596
  • 3658699,3748596

0 个答案:

没有答案