我正在尝试使用正则表达式删除nbsp;从我的字符串。以下是该计划。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyTest {
private static final StringBuffer testRegex =
new StringBuffer("<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font>" +
"<BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>Test</p><strong>" +
"<FONT color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
"<br><p>TestTest</p><br><BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ffcc66\">TestTestTestTestTest</font><br>" +
"<p>TestTestTestTest</p></blockquote><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003333\">TestTestTest</font></p><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003399\">TestTest</font></p><p> </p>");
//"This is test<P>Tag Tag</P>";
public static void main(String[] args) {
System.out.println("***Testing***");
String temp = checkRegex(testRegex);
System.out.println("***FINAL = "+temp);
}
private static String checkRegex(StringBuffer sample){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(sample);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String group = matcher.group();
System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);
String substring = sample.substring(start, end);
System.out.println(" Substring = "+substring);
String replacedSubString = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubString);
sample.replace(start, end, replacedSubString);
System.out.println(" NEW SAMPLE = "+sample);
}
System.out.println("********WHILE OVER ********");
return sample.toString();
}
}
我在第java.lang.StringIndexOutOfBoundsException
行获得while (matcher.find())
。我目前正在使用java Pattern和Matcher来查找nbsp
;并将其替换为" "
。有谁知道是什么原因造成的?我应该怎么做才能删除额外的nbsp
;从我的字符串?
由于
答案 0 :(得分:1)
在matcher.reset();
sample.replace(start, end, replacedSubString);
这是因为当您替换字符串示例时,end
将指向无效位置。因此,您需要使用{{1}每个matcher.reset();
之后。
例如,如果start为0且end为5,当您将replace
替换为
时,结尾将指向无效位置,然后方法将抛出{{ 1}}如果结束点位于字符串长度之外,则为异常。
如果字符串很大,重置可能会导致严重的性能瓶颈,因为find
将从头开始再次匹配。您可以改为使用
StringIndexOutOfBoundsException
这将从最后匹配的位置开始匹配!
答案 1 :(得分:0)
// change the group and it is source string is automatically updated
没有办法改变Java中的任何字符串,所以你要求的是不可能的。
使用
之类的调用可以实现删除或替换带有字符串的模式someString = someString.replaceAll(toReplace, replacement);
转换匹配的子字符串,似乎由您的行
指示m.group().replaceAll("something","");
最好的解决方案可能是使用StringBuffer
作为结果
Matcher.appendReplacement and Matcher.appendTail.
示例:
String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// For example: transform match to upper case
String replacement = m.group().toUpperCase();
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
sourceString = sb.toString();
System.out.println(sourceString); // "lorem IPSUM dolor sit"
答案 2 :(得分:0)
您需要创建一个新StringBuffer
来保存已替换的字符串,然后使用Matcher
类中的appendReplacement(StringBuffer sb, String replacement)
和appendTail(StringBuffer sb)
方法进行替换。有可能就地做到这一点,但上面的方法是最直接的方法。
这是您重写的checkRegex
方法:
private static String checkRegex(String inputString){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(inputString);
// Create a new StringBuffer to hold the string after replacement
StringBuffer replacedString = new StringBuffer();
while (matcher.find()) {
// matcher.group() returns the substring that matches the whole regex
String substring = matcher.group();
System.out.println(" Substring = "+substring);
String replacedSubstring = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubstring);
// appendReplacement is a clean approach to append the text which comes
// before a match, and append the replacement text for the matched text
// Note that appendReplacement will interpret $ in the replacement string
// with special meaning (for referring to text matched by capturing group).
// Matcher.quoteReplacement is necessary to provide a literal string as
// replacement
matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));
System.out.println(" NEW SAMPLE = "+replacedString);
}
// appendTail is used to append the text after the last match to the
// replaced string.
matcher.appendTail(replacedString);
System.out.println("********WHILE OVER ********");
return replacedString.toString();
}