我是Jsoup的新人。我试图修改以下示例。
<div>
text that <string>need</strong> to be <strong>wrapped</strong>
<p>a text that has to be ignored</p>
another text that <string>need</strong> to be <strong>wrapped</strong>
</div>
获得此
<div>
<p>text that <string>need</strong> to be <strong>wrapped</strong></p>
<p>a text that has to be ignored</p>
<p>another text that <string>need</strong> to be <strong>wrapped</strong></p>
</div>
所以,我需要包含所有不在&lt; p&gt;内的文本。使用&lt; p&gt;
我试过这样的事情
Document doc = Jsoup.parse(html);
doc.body().traverse(new NodeVisitor() {
@Override
public void head(Node node, int depth) {
if(node instanceof TextNode && Arrays.asList("div","body").contains(node.parentNode().nodeName())) {
Node auxNode = node;
node.replaceWith(pNode);
node.childNodes();
while (auxNode.nextSibling() != null && Arrays.asList("em", "strong").contains(auxNode.nextSibling().nodeName())) {
node.after(auxNode);
auxNode.remove();
auxNode = node.nextSibling();
}
node.wrap("<p></p>");
}
}
@Override
public void tail(Node node, int depth) { }
});
但是我只是在while条件下继续得到NullPointerException。
提前致谢
java.lang.NullPointerException
at HTMLToArticleParser$1.head(HTMLToArticleParser.java:52)
at org.jsoup.select.NodeTraversor.traverse(NodeTraversor.java:31)
at org.jsoup.nodes.Node.traverse(Node.java:536)
at HTMLToArticleParser.parse(HTMLToArticleParser.java:47)
at HTMLToArticleParser_Tests.jTest(HTMLToArticleParser_Tests.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:262)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84)
答案 0 :(得分:1)
类NewNode
public class NodesToProcess {
private Node oldNode;
private NewNode newNode;
private List<Node> toRemove;
public NodesToProcess(Node oldNode, NewNode newNode, List<Node> toRemove) {
this.oldNode = oldNode;
this.newNode = newNode;
this.toRemove = toRemove;
}
public Node getOldNode() {
return oldNode;
}
public Node getNewNode() {
return newNode.getNewNode();
}
public List<Node> getToRemove() {
return toRemove;
}
}
类NodesToProcess
private void wrapUnwrappedTextInTagP(Element element) {
List<NodesToProcess> nodesToProcesses = new ArrayList<>();
List<Node> nodeAlreadyUsed = new ArrayList<>();
element.childNodes().forEach(node -> {
if(node instanceof TextNode && !nodeAlreadyUsed.contains(node)) {
List<Node> newChilds = new ArrayList<>();
List<Node> toRemove = new ArrayList<>();
newChilds.add(node);
nodeAlreadyUsed.add(node);
Node auxNode = node.nextSibling();
while (auxNode != null && parentIsBodyAndIsAnTextElement(auxNode)) {
newChilds.add(auxNode);
nodeAlreadyUsed.add(auxNode);
toRemove.add(auxNode);
auxNode = auxNode.nextSibling();
}
nodesToProcesses.add(new NodesToProcess(node, new NewNode(newChilds), toRemove));
}
});
nodesToProcesses.forEach(nodesToProcess -> {
nodesToProcess.getOldNode().replaceWith(nodesToProcess.getNewNode());
nodesToProcess.getToRemove().forEach(node -> node.remove());
});
}
并且此方法是包装未包装的文本的方法
Document doc = Jsoup.parse(html);
wrapUnwrappedTextInTagP(doc.body());
所以,在主要方法
{{1}}