我正在匹配元素文本中的特定字符串,并希望将匹配的文本包裹在一个范围内,以便能够选择它并稍后应用修改,但是html实体已被转义。有没有一种方法可以将带有html标签的字符串转义?
我尝试使用unescapeEntities()
方法,但是在这种情况下不起作用。
wrap()
不能正常工作。
有关这些方法的参考,请检查https://jsoup.org/apidocs/org/jsoup/parser/Parser.html
当前代码:
for (Element div : doc.select("div")) {
for (String input : listOfStrings) {
if (div.ownText().contains(input)) {
div.text(div.ownText().replaceFirst(input, "<span class=\"select-me\">" + input + "</span>"));
}
}
}
所需的输出
<div>some text <span class="select-me">matched string</span></div>
实际输出
<div>some text <span class="select-me">matched string</span></div>
答案 0 :(得分:3)
根据您的问题和评论,您似乎只想修改所选元素的直接文本节点,而无需修改所选文本的潜在内部元素的文本节点,因此对于
<div>a b <span>b c</span></div>
如果我们想修改b
,我们只修改直接放在<div>
中的一个,而不修改<span>
中的一个。
<div>a b <span>b c</span></div>
^ ^----don't modify because it is in <span>, not *directly* in <div>
|
modify
文本不像ElementNode
<div>
等那样被认为是<span>
,但是在DOM中,文本表示为TextNode
,因此如果我们具有<div> a <span>b</span> c </div>
这样的结构,则其DOM表示为
Element: <div>
├ Text: " a "
├ Element: <span>
│ └ Text: "b"
└ Text: " c "
如果我们想将部分文本包装到<span>
(或任何其他标签)中,我们将有效地分割单个TextNode
├ Text: "foo bar baz"
分为以下系列:
├ Text: "foo "
├ Element: <span>
│ └ Text: "bar"
└ Text: " baz"
要创建使用该思想的解决方案TextNode,API给我们提供了非常有限的工具集,但是在可用的方法中,我们可以使用
splitText(index)
,它修改原始TextNode并在其中保留拆分的“左侧”,并返回新的TextNode,该TextNode保留拆分的其余(右侧),就像TextNode node1
之后保留"foo bar"
一样TextNode node2 = node1.splitText(3);
node1
将持有"foo"
,而node2
将持有" bar"
,并将被放置为node1
之后的直接同级兄弟wrap(htmlElement)
(从超类{{1}继承)将TextNode包装在表示Node
的ElementNode中,例如htmlElement
,将得到node.wrap("<span class='myClass'>")
。使用上述“工具”,我们可以创建类似
的方法<span class='myClass>text from node</span>
我们可以这样使用:
static void wrapTextWithElement(TextNode textNode, String strToWrap, String wrapperHTML) {
while (textNode.text().contains(strToWrap)) {
// separates part before strToWrap
// and returns node starting with text we want
TextNode rightNodeFromSplit = textNode.splitText(textNode.text().indexOf(strToWrap));
// if there is more text after searched string we need to
// separate it and handle in next iteration
if (rightNodeFromSplit.text().length() > strToWrap.length()) {
textNode = rightNodeFromSplit.splitText(strToWrap.length());
// after separating remining part rightNodeFromSplit holds
// only part which we ware looking for so lets wrap it
rightNodeFromSplit.wrap(wrapperHTML);
} else { // here we know that node is holding only text to wrap
rightNodeFromSplit.wrap(wrapperHTML);
return;// since textNode didn't change but we already handled everything
}
}
}
结果:
Document doc = Jsoup.parse("<div>b a b <span>b c</span> d b</div> ");
System.out.println("BEFORE CHANGES:");
System.out.println(doc);
Element id1 = doc.select("div").first();
for (TextNode textNode : id1.textNodes()) {
wrapTextWithElement(textNode, "b", "<span class='x'>");
}
System.out.println();
System.out.println("AFTER CHANGES");
System.out.println(doc);
答案 1 :(得分:1)
评论中的详细说明:
import java.util.ArrayList;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;
public class StackOverflow56717248 {
public static void main(String[] args) {
List<String> listOfStrings = new ArrayList<>();
listOfStrings.add("INPUT");
Document doc = Jsoup.parse(
"<div id=\"1\">some text 1</div>" +
"<div id=\"2\"> node before <b>xxx</b> this one contains INPUT text <b>xxx</b> node after</div>");
System.out.println("BEFORE: ");
System.out.println(doc);
// iterating over all the divs
for (Element div : doc.select("div")) {
// and input texts
for (String input : listOfStrings) {
// to find the one with desired text
if (div.ownText().contains(input)) {
// when found we have to be aware that this node may not be the only child
// so we have to iterate over children nodes
for (int i = 0; i < div.childNodeSize(); i++) {
Node child = div.childNode(i);
// taking into account only TextNodes
if (child instanceof TextNode && ((TextNode) child).text().contains(input)) {
TextNode textNode = ((TextNode) child);
// when found the one matching we can split text node
// into two nodes breaking it on position of desired text
// which will be inserted as a next sibling node
int indexOfInputText = textNode.text().indexOf(input);
textNode.splitText(indexOfInputText);
// getting the next node (the one newly created!)
TextNode nodeWithInput = (TextNode) textNode.nextSibling();
// we have to split it again in case there is more text after the input text
nodeWithInput.splitText(input.length());
// now this node contains only input text so we can wrap it with whatever you want
nodeWithInput.wrap("<span class=\"select-me\"></span>");
break;
}
}
}
}
}
System.out.println("--------");
System.out.println("RESULT:");
System.out.println(doc);
}
}