文本到字符串映射减少

时间:2017-02-03 23:15:05

标签: mapreduce hadoop2 hortonworks-sandbox

我正在尝试使用Hortonworks Sandbox中的mapreduce2(yarn)拆分字符串。 如果我尝试访问val [1]它会抛出一个ArrayOutOfBound异常,当我不分割输入文件时工作正常。

映射器:

public class MapperClass extends Mapper<Object, Text, Text, Text> {

    private Text airline_id;
    private Text name;
    private Text country;
    private Text value1;

    public void map(Object key, Text value, Context context)
            throws IOException, InterruptedException {

        String s = value.toString();
        if (s.length() > 1) {

            String val[] = s.split(",");
            context.write(new Text("blah"), new Text(val[1]));
        }


    }
}

减速机:

public class ReducerClass extends Reducer<Text, Text, Text, Text> {

private Text result = new Text();

public void reduce(Text key, Iterable<Text> values, Context context)
        throws IOException, InterruptedException {

    String airports = "";

    if (key.equals("India")) {
        for (Text val : values) {
            airports += "\t" + val.toString();
        }
        result.set(airports);
        context.write(key, result);
    }
}
}

MainClass:

public class MainClass {

public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

    Configuration conf = new Configuration();
    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "Flights MR");

    job.setJarByClass(MainClass.class);
    job.setMapperClass(MapperClass.class);
    job.setReducerClass(ReducerClass.class);

    job.setNumReduceTasks(0);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);

    job.setInputFormatClass(KeyValueTextInputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

你能帮忙吗?

更新

想通了它没有将Text转换为String。

1 个答案:

答案 0 :(得分:0)

如果您要拆分的字符串不包含逗号,则生成的String []的长度为1,整个字符串位于val [0]。

目前,您确保该字符串不是空字符串

if (s.length() > -1)

但是你没有检查分割是否会实际导致长度大于1的数组,并假设存在分裂。

context.write(new Text("blah"), new Text(val[1]));

如果没有拆分,这将导致越界错误。一个可能的解决方案是确保字符串包含至少1个逗号,而不是像这样检查它是不是空字符串:

String s = value.toString();
if (s.indexOf(',') > -1) {

    String val[] = s.split(",");
    context.write(new Text("blah"), new Text(val[1]));
}