Java正则表达式只替换html文件中的文本

时间:2016-12-12 14:10:19

标签: java regex

我必须用Java编写一些代码来突出显示JTextPane中显示的html文件的文本。

要突出显示,我将"match"替换为"<span style=\"background-color: #FFFF00\">match</span>",并在JTextPane中设置整个替换文字。一切正常!我是在java.util.regex.Patternjava.util.regex.Matcher的帮助下完成的。

现在,我确定了一个问题:匹配器还匹配html标记中的文本。例如,这一行:

<pre><a name="hello-world">Hello World</a></pre>

我需要一个正则表达式,创建一个只在字符串&#34; Hello World&#34;中搜索的java.util.regex.Pattern

所以,如果我想突出显示"e"的匹配,它应该是

<pre><a name="hello-world">H<span style=\"background-color: #FFFF00\">e</span>llo World</a></pre>

非常感谢你的帮助!!

2 个答案:

答案 0 :(得分:0)

我会做类似的事情:

Pattern pattern = Pattern.compile("^>(.*)$<");
Matcher matcher = pattern.matcher(matchedTextBuilder.toString());
while (matcher.find()) {
    String matchedFoundText = matcher.group();
}

更好的方法:

public static void main(String[] args) {
    String originalString = "dfedf >Hello< href= ui /> Hello< another";
    StringBuilder sb = new StringBuilder("");
    Pattern pattern = Pattern.compile(">(\\s+)?\\w+(\\s+)?<");
    Matcher matcher = pattern.matcher(originalString);
    int endIndex = 0;
    while (matcher.find()) {
        String matchedFoundText = matcher.group();
        sb.append(originalString.substring(endIndex, matcher.start() + 1));
        sb.append(matchedFoundText.substring(1, matchedFoundText.length() - 1).replaceAll("e",
                "<span style=\"background-color: #FFFF00\">e</span>"));
        sb.append("<");
        endIndex = matcher.end();
    }
    sb.append(originalString.substring(endIndex + 1));
    System.out.println(sb.toString());

}

答案 1 :(得分:0)

尝试使用Jsoup一个html解析器,它可以用来从URL,文件或字符串中抓取和解析HTML,还可以操作HTML元素,属性和文本。案例:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class NewClass2 {

    public static void main(String args[]) {
        String html = " <!DOCTYPE html>\n" +
                        "<html>\n" +
                            "<head>\n" +
                                "<title>Page Title</title>\n" +
                            "</head>\n" +
                            "<body>\n" +
                                "<h1>This is a Heading which should match</h1>\n" +
                                "<p>This is a paragraph which should also match.</p>\n" +
                            "</body>\n" +
                        "</html> ";

        String matchWord = "match";
        Document doc = Jsoup.parse(html);
        System.out.println("before : \n");
        System.out.println(doc.toString()+"\n");

        Elements matchingElements = doc.getElementsContainingOwnText(matchWord);
        for (Element e : matchingElements) {
            e.html(e.html().replace(matchWord,"<span style=\"background-color: #FFFF00\">"+matchWord+"</span>"));
        }
        System.out.println("after : \n");
        System.out.println(doc.toString());
   }
}