java.lang.StringIndexOutOfBoundsException:来自java.util.regex.Matcher

时间:2013-04-23 05:06:15

标签: java regex

我正在尝试使用正则表达式删除nbsp;从我的字符串。以下是该计划。

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

     public class MyTest {

    private static final StringBuffer testRegex = 
        new StringBuffer("<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ff6600\">Test</font></p><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ff6600\">Test</font></p><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ff6600\">Test</font>" +
        "<BLOCKQUOTE&nbsp;style=\"MARGIN-RIGHT:&nbsp;0px\"&nbsp;dir=ltr><br><p>Test</p><strong>" +
        "<FONT&nbsp;color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
        "<br><p>TestTest</p><br><BLOCKQUOTE&nbsp;style=\"MARGIN-RIGHT:&nbsp;0px\"&nbsp;dir=ltr><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#ffcc66\">TestTestTestTestTest</font><br>" +
        "<p>TestTestTestTest</p></blockquote><br><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#003333\">TestTestTest</font></p><p>" +
        "<FONT&nbsp;style=\"BACKGROUND-COLOR:&nbsp;#003399\">TestTest</font></p><p>&nbsp;</p>");

    //"This&nbsp;is&nbsp;test<P>Tag&nbsp;Tag</P>";

    public static void main(String[] args) {
        System.out.println("***Testing***");
        String temp = checkRegex(testRegex);
        System.out.println("***FINAL = "+temp);

    }

    private static String checkRegex(StringBuffer sample){
        Pattern pattern = Pattern.compile("<[^>]+?&nbsp;[^<]+?>");      
        Matcher matcher = pattern.matcher(sample);      
        while (matcher.find()) {
            int start = matcher.start();
            int end = matcher.end();
            String group = matcher.group();
            System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);

            String substring = sample.substring(start, end);
            System.out.println(" Substring = "+substring);
            String replacedSubString = substring.replaceAll("&nbsp;"," ");  
            System.out.println("Replaced Substring = "+replacedSubString);

            sample.replace(start, end, replacedSubString);
            System.out.println(" NEW SAMPLE = "+sample);

        }
        System.out.println("********WHILE OVER ********");
        return sample.toString();
    }

}

我在第java.lang.StringIndexOutOfBoundsException行获得while (matcher.find())。我目前正在使用java Pattern和Matcher来查找nbsp;并将其替换为" "。有谁知道是什么原因造成的?我应该怎么做才能删除额外的nbsp;从我的字符串?

由于

3 个答案:

答案 0 :(得分:1)

matcher.reset();

之后使用sample.replace(start, end, replacedSubString);

这是因为当您替换字符串示例时,end将指向无效位置。因此,您需要使用{{1}每个matcher.reset();之后。

例如,如果start为0且end为5,当您将replace替换为&nbsp;时,结尾将指向无效位置,然后方法将抛出{{ 1}}如果结束点位于字符串长度之外,则为异常。


如果字符串很大,重置可能会导致严重的性能瓶颈,因为find将从头开始再次匹配。您可以改为使用

StringIndexOutOfBoundsException

这将从最后匹配的位置开始匹配!

答案 1 :(得分:0)

// change the group and it is source string is automatically updated

没有办法改变Java中的任何字符串,所以你要求的是不可能的。

使用

之类的调用可以实现删除或替换带有字符串的模式
someString = someString.replaceAll(toReplace, replacement);

转换匹配的子字符串,似乎由您的行

指示
m.group().replaceAll("something","");

最好的解决方案可能是使用StringBuffer作为结果

Matcher.appendReplacement and Matcher.appendTail.

示例:

String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";

Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();

while (m.find()) {
    // For example: transform match to upper case
    String replacement = m.group().toUpperCase();
    m.appendReplacement(sb, replacement);
}

m.appendTail(sb);

sourceString = sb.toString();

System.out.println(sourceString); // "lorem IPSUM dolor sit"

答案 2 :(得分:0)

您需要创建一个新StringBuffer来保存已替换的字符串,然后使用Matcher类中的appendReplacement(StringBuffer sb, String replacement)appendTail(StringBuffer sb)方法进行替换。有可能就地做到这一点,但上面的方法是最直接的方法。

这是您重写的checkRegex方法:

private static String checkRegex(String inputString){
    Pattern pattern = Pattern.compile("<[^>]+?&nbsp;[^<]+?>");      
    Matcher matcher = pattern.matcher(inputString);

    // Create a new StringBuffer to hold the string after replacement
    StringBuffer replacedString = new StringBuffer();

    while (matcher.find()) {
        // matcher.group() returns the substring that matches the whole regex
        String substring = matcher.group();
        System.out.println(" Substring = "+substring);

        String replacedSubstring = substring.replaceAll("&nbsp;"," "); 
        System.out.println("Replaced Substring = "+replacedSubstring);


        // appendReplacement is a clean approach to append the text which comes
        // before a match, and append the replacement text for the matched text

        // Note that appendReplacement will interpret $ in the replacement string
        // with special meaning (for referring to text matched by capturing group).
        // Matcher.quoteReplacement is necessary to provide a literal string as
        // replacement
        matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));

        System.out.println(" NEW SAMPLE = "+replacedString);
    }

    // appendTail is used to append the text after the last match to the
    // replaced string.
    matcher.appendTail(replacedString);

    System.out.println("********WHILE OVER ********");
    return replacedString.toString();
}