如何用<span>或任何其他HTML标签包装部分文本而又不逃脱新的HTML结构?

时间:2019-06-22 16:35:08

标签: java jsoup

我正在匹配元素文本中的特定字符串,并希望将匹配的文本包裹在一个范围内,以便能够选择它并稍后应用修改,但是html实体已被转义。有没有一种方法可以将带有html标签的字符串转义?

我尝试使用unescapeEntities()​方法,但是在这种情况下不起作用。 wrap()不能正常工作。 有关这些方法的参考,请检查https://jsoup.org/apidocs/org/jsoup/parser/Parser.html

当前代码:

for (Element div : doc.select("div")) {
    for (String input : listOfStrings) {
        if (div.ownText().contains(input)) {
            div.text(div.ownText().replaceFirst(input, "<span class=\"select-me\">" + input + "</span>"));
        }
    }
}

所需的输出

<div>some text <span class="select-me">matched string</span></div>

实际输出

<div>some text &lt;span class=&quot;select-me&quot;&gt;matched string&lt;/span&gt;</div>

2 个答案:

答案 0 :(得分:3)

根据您的问题和评论,您似乎只想修改所选元素的直接文本节点,而无需修改所选文本的潜在内部元素的文本节点,因此对于

<div>a b <span>b c</span></div> 

如果我们想修改b,我们只修改直接放在<div>中的一个,而不修改<span>中的一个。

<div>a b <span>b c</span></div> 
       ^       ^----don't modify because it is in <span>, not *directly* in <div>
       |
     modify

文本不像ElementNode <div>等那样被认为是<span>,但是在DOM中,文本表示为TextNode,因此如果我们具有<div> a <span>b</span> c </div>这样的结构,则其DOM表示为

Element: <div>
├ Text: " a "
├ Element: <span>
│ └ Text: "b"
└ Text: " c "

如果我们想将部分文本包装<span>(或任何其他标签)中,我们将有效地分割单个TextNode

├ Text: "foo bar baz"

分为以下系列:

├ Text: "foo "
├ Element: <span>
│ └ Text: "bar"
└ Text: " baz"

要创建使用该思想的解决方案TextNode,API给我们提供了非常有限的工具集,但是在可用的方法中,我们可以使用

  • splitText(index),它修改原始TextNode并在其中保留拆分的“左侧”,并返回新的TextNode,该TextNode保留拆分的其余(右侧),就像TextNode node1之后保留"foo bar"一样TextNode node2 = node1.splitText(3); node1将持有"foo",而node2将持有" bar",并将被放置为node1之后的直接同级兄弟
  • wrap(htmlElement)(从超类{{1}继承)将TextNode包装在表示Node的ElementNode中,例如htmlElement,将得到node.wrap("<span class='myClass'>")

使用上述“工具”,我们可以创建类似

的方法
<span class='myClass>text from node</span>

我们可以这样使用:

static void wrapTextWithElement(TextNode textNode, String strToWrap, String wrapperHTML) {

    while (textNode.text().contains(strToWrap)) {
        // separates part before strToWrap
        // and returns node starting with text we want
        TextNode rightNodeFromSplit = textNode.splitText(textNode.text().indexOf(strToWrap));

        // if there is more text after searched string we need to
        // separate it and handle in next iteration
        if (rightNodeFromSplit.text().length() > strToWrap.length()) {
            textNode = rightNodeFromSplit.splitText(strToWrap.length());
            // after separating remining part rightNodeFromSplit holds
            // only part which we ware looking for so lets wrap it
            rightNodeFromSplit.wrap(wrapperHTML);
        } else { // here we know that node is holding only text to wrap
            rightNodeFromSplit.wrap(wrapperHTML);
            return;// since textNode didn't change but we already handled everything
        }
    }
}

结果:

Document doc = Jsoup.parse("<div>b a b <span>b c</span> d b</div> ");
System.out.println("BEFORE CHANGES:");
System.out.println(doc);

Element id1 = doc.select("div").first();
for (TextNode textNode : id1.textNodes()) {
    wrapTextWithElement(textNode, "b", "<span class='x'>");
}

System.out.println();
System.out.println("AFTER CHANGES");
System.out.println(doc);

答案 1 :(得分:1)

评论中的详细说明:

import java.util.ArrayList;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

public class StackOverflow56717248 {

    public static void main(String[] args) {
        List<String> listOfStrings = new ArrayList<>();
        listOfStrings.add("INPUT");
        Document doc = Jsoup.parse(
                "<div id=\"1\">some text 1</div>" +
                "<div id=\"2\"> node before <b>xxx</b> this one contains INPUT text <b>xxx</b> node after</div>");
        System.out.println("BEFORE: ");
        System.out.println(doc);
        // iterating over all the divs
        for (Element div : doc.select("div")) {
            // and input texts
            for (String input : listOfStrings) {
                // to find the one with desired text
                if (div.ownText().contains(input)) {
                    // when found we have to be aware that this node may not be the only child
                    // so we have to iterate over children nodes
                    for (int i = 0; i < div.childNodeSize(); i++) {
                        Node child = div.childNode(i);
                        // taking into account only TextNodes
                        if (child instanceof TextNode && ((TextNode) child).text().contains(input)) {
                            TextNode textNode = ((TextNode) child);
                            // when found the one matching we can split text node
                            // into two nodes breaking it on position of desired text
                            // which will be inserted as a next sibling node
                            int indexOfInputText = textNode.text().indexOf(input);
                            textNode.splitText(indexOfInputText);
                            // getting the next node (the one newly created!)
                            TextNode nodeWithInput = (TextNode) textNode.nextSibling();
                            // we have to split it again in case there is more text after the input text
                            nodeWithInput.splitText(input.length());
                            // now this node contains only input text so we can wrap it with whatever you want
                            nodeWithInput.wrap("<span class=\"select-me\"></span>");
                            break;
                        }
                    }
                }
            }
        }
        System.out.println("--------");
        System.out.println("RESULT:");
        System.out.println(doc);
    }

}