Question

初级水平 - 无法理解这一点。

我编写了一个MapReduce程序，该程序将生成一个单词列表以及它们出现在哪个行号，例如

和：4
if：1,2,3

等。（参见下面的示例文本）。

当第2行没有空白行时，我的代码运行得非常好。但它会为下面的示例文本抛出异常1错误。

从我的理解，这个错误表明我试图访问的数组中有一个元素不存在 - 在这种情况下，数组中没有第2行的元素。但是如何编辑我的代码忽略空行？

这是Mapper代码（带有示例文本）：

import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

// SampleText:
1: if you prick us do we not bleed
2: 
3: if you tickle us do we not laugh
4: if you poison us do we not_ die and
5: ***if you wrong us shall we not revenge

public class IIndexMapper extends Mapper<LongWritable, Text, Text, Text> {

    private final static Text listing = new Text();
    private Text wordText = new Text();

    @Override
    public void map(LongWritable key, Text value, Context context)
                throws IOException, InterruptedException {
        //Single out line number by splitting each line by colon - first part being lineID
        String[] line = value.toString().split(": ");
        String lineID = line[0];
        listing.set(lineID);

        //Further split second part of the line by spaces
        String textStr = line[1];

        //Create an array of words contained in each line
        String [] tokens = textStr.split(" ");
        int count = tokens.length;

        for (int i = 0; i < count; i++) {
            wordText.set(tokens[i]);
            context.write(wordText, listing);
        }
    }
}

非常感谢任何帮助！

谢谢！

Answer 1

我重写了上面这样的代码，因为我们正在检查存在时访问String Array中的元素。

String[] line = value.toString().split(": ");
if(line.length >= 2){

    listing.set(line[0]);

    //Create an array of words contained in each line
    String [] tokens = line[1].split(" ");
    int count = tokens.length;

    for (String token : tokens){
        wordText.set(token);
        context.write(wordText, listing);
    }
} else {
    // increase the counter for tacking bad lines
    context.getCounter("INVALID_LINES").increment(1);
}

Answer 2

您首先需要了解String.split的工作原理。它将字符串拆分为围绕给定正则表达式的匹配的多个部分。在您的示例中，它正在尝试拆分＆＃34;：＆＃34;。对于第二行，＆＃34;之后没有任何内容：＆＃34;所以line元素只有一个元素长，并且在行[0]中只有一个String。当您尝试访问第[1]行时，您将获取ArrayIndexOutOfBoundsException，因为它不存在。

举个简单的例子：

String sampleString = "2: ";

String[] line = sampleString.split(": ");

//Length is 1
System.out.println(line.length);

// Value is 2
System.out.println("0: " + line[0]);
// ArrayIndexOutOfBoundsException
System.out.println("1: " + line[1]);

您可以通过执行以下操作来防范此问题：

if (line.length > 0) {
    //Further split second part of the line by spaces
    String textStr = line[1];

    //Create an array of words contained in each line
    String[] tokens = textStr.split(" ");
    int count = tokens.length;

    for (int i = 0; i < count; i++) {
        wordText.set(tokens[i]);
        context.write(wordText, listing);
    }
}

java.lang.ArrayIndexOutOfBoundsException：MapReduce

2 个答案: