使用jsoup

时间:2018-06-05 10:04:23

标签: html jsoup

我想更改带有某种背景颜色的跨度的HTML元素的文本内容。 HTML的格式如下

  <html>
   <head></head>
   <body>Gc <br>
   Stable <br>
   Oral intake better <br>
   Urine stool normal <br>
   </body>
</html>

我有以下关键字需要匹配:

Gc,Stable,Oral,Urine

我有字符串形式的html

"<html><head></head><body>Gc <br>Stable <br>Oral intake better <br>Urine stool normal <br>Pain Relief <br>Vital stable <br>No problem <br>Adv tab pan 40mg 1od <br>Tab pcm500mg 6hourly <br>Cab gab 300mg 1bd <br>Cab becasol 1od <br>Cab Tramadol 50mg 6hourly   </body></html>"

我想匹配元素文本内容,如果它与HTML字符串匹配,则用关键字替换它们。我会在给出背景颜色和匹配关键字文本的范围内更改它们。

结果HTML如下所示。

<html>
 <head></head>
 <body>
  <div>   
   <div>
     <span style="background: #FF9999;">Gc</span> 
    <br><span style="background: #FF9999;">Stable</span> 
    <br><span style="background: #FF9999;">Oral</span> intake better 
    <br><span style="background: #FF9999;">Urine</span> stool normal 
    <br>Pain Relief 
    <br>Vital stable 
    <br>No problem 
    <br>Adv tab pan 40mg 1od 
    <br>Tab pcm500mg 6hourly 
    <br>Cab gab 300mg 1bd 
    <br>Cab becasol 1od 
    <br>Cab Tramadol 50mg 6hourly 
   </div>  
  </div>
 </body>
</html>

我如何在java中实现它。我正在使用jsoup库。

这段代码对我有用。这是最佳方法吗?或者更换html字符串的更好方法

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.TextNode;
import org.jsoup.parser.Tag;
import org.jsoup.select.Elements;

public class regexReplaceHtml {

    public static void main(String args[]) throws IOException {

        String html2 = "<html><head></head><body>Gc <br>Stable <br>Oral intake better <br>Urine stool normal <br>Pain Relief <br>Vital stable <br>No problem <br>Adv tab pan 40mg 1od <br>Tab pcm500mg 6hourly <br>Cab gab 300mg 1bd <br>Cab becasol 1od <br>Cab Tramadol 50mg 6hourly   </body></html>";


        String html = "<div>" + html2 + "</div>";

        Document doc = Jsoup.parse(html);

        List<String> keywords = new ArrayList<String>();
        keywords.add("Gc");
        keywords.add("Stable");
        keywords.add("Oral");
        keywords.add("Urine");

        String convertedString = replaceHtmlString(doc.html(),keywords);

        System.out.println(convertedString);

    }

   public static String replaceHtmlString(String html, List<String> keywords) {
        String htmlString = "<div>" + html + "</div>";

        Document doc = Jsoup.parse(htmlString);
        Elements elements = doc.body().children().select("*");

        for (Element element : elements) {

            List<TextNode> tnList = element.textNodes();

            for (TextNode tn : tnList) {
                String nodeTrimmedText = tn.text().trim();

                for (int i = 0; i < keywords.size(); i++) {
                    String keyword = keywords.get(i);
                    if (isContainExactWord(nodeTrimmedText, keyword)) {
                        String nodeText = tn.text();
                        String keywordHtmlString = "<span style=\"background: #FF9999;\">" + keyword + "</span>";
                        String replacedTextHtmlString = nodeText.replace(keyword, keywordHtmlString);
                        tn.text(replacedTextHtmlString);
                    }
                }

            }
        }

        //I had to replace the &lt; and &gt; with the respective symbols
        return doc.html().replaceAll("&lt;", "<").replaceAll("&gt;", ">");
    }

    private static boolean isContainExactWord(String fullString, String partWord) {
        String pattern = "\\b" + partWord + "\\b";
        Pattern p = Pattern.compile(pattern);
        Matcher m = p.matcher(fullString);
        return m.find();
    }

}

2 个答案:

答案 0 :(得分:0)

以下代码应该做你想要的。它需要一个关键字列表,并将其替换为您提到的范围标记。

List<String> keywords = new ArrayList<String>();
keywords.add("Gc");
keywords.add("Stable");
keywords.add("Oral");
keywords.add("Urine");

Element body = doc.getElementsByTag("body").first();

List<TextNode> nodes = body.textNodes();

for(TextNode node : nodes){
    String nodeText = node.text();

    for(String keyword : keywords){
        if(nodeText.contains(keyword)){
            String newText = nodeText.replace(keyword, "");
            node.text(newText);

            node.before("<span style=\"background-color:#FF9999;\">" + keyword + "</span>");
        }
    }
}

答案 1 :(得分:0)

有正则表达式解决方案:

module.exports = {
  head: {
    // Skipping noise...
  },
  modules: [
    [ 'nuxt-fontawesome', {
      component: 'fa',
      imports: [
        { set: '@fortawesome/fontawesome-free-brands' },
      ]
    }],
  ],
  loading: { color: '#3B8070' },
  build: {
    extend (config, { isDev, isClient }) {
      if (isDev && isClient) {
        config.module.rules.push({
          enforce: 'pre',
          test: /\.(js|vue)$/,
          loader: 'eslint-loader',
          exclude: /(node_modules)/
        })

// *** NOTE: In practice, I'm only using FA brands here, not the others ***

        config.resolve.alias['@fortawesome/fontawesome-free-brands$'] = '@fortawesome/fontawesome-free-brands/shakable.es.js'  
      }
    }
  }
}