jsoup加入一些节点并将其包装在一个元素中

时间:2016-11-23 17:12:47

标签: java jsoup

我是Jsoup的新人。我试图修改以下示例。

<div>
    text that <string>need</strong> to be <strong>wrapped</strong>
    <p>a text that has to be ignored</p>
    another text that <string>need</strong> to be <strong>wrapped</strong>
</div>

获得此

<div>
    <p>text that <string>need</strong> to be <strong>wrapped</strong></p>
    <p>a text that has to be ignored</p>
    <p>another text that <string>need</strong> to be <strong>wrapped</strong></p>
</div>

所以,我需要包含所有不在&lt; p&gt;内的文本。使用&lt; p&gt;

我试过这样的事情

Document doc = Jsoup.parse(html);
doc.body().traverse(new NodeVisitor() {
    @Override
    public void head(Node node, int depth) {
        if(node instanceof TextNode && Arrays.asList("div","body").contains(node.parentNode().nodeName())) {
            Node auxNode = node;
            node.replaceWith(pNode);
            node.childNodes();

            while (auxNode.nextSibling() != null && Arrays.asList("em", "strong").contains(auxNode.nextSibling().nodeName())) {
                node.after(auxNode);
                auxNode.remove();
                auxNode = node.nextSibling();
            }
            node.wrap("<p></p>");
        }
    }

    @Override
    public void tail(Node node, int depth) { }
});

但是我只是在while条件下继续得到NullPointerException。

提前致谢

java.lang.NullPointerException
    at HTMLToArticleParser$1.head(HTMLToArticleParser.java:52)
    at org.jsoup.select.NodeTraversor.traverse(NodeTraversor.java:31)
    at org.jsoup.nodes.Node.traverse(Node.java:536)
    at HTMLToArticleParser.parse(HTMLToArticleParser.java:47)
    at HTMLToArticleParser_Tests.jTest(HTMLToArticleParser_Tests.java:188)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:42)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:262)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:84)

1 个答案:

答案 0 :(得分:1)

感谢大家。我可以解决这个问题

类NewNode

public class NodesToProcess {

    private Node oldNode;
    private NewNode newNode;
    private List<Node> toRemove;
    public NodesToProcess(Node oldNode, NewNode newNode, List<Node> toRemove) {
        this.oldNode = oldNode;
        this.newNode = newNode;
        this.toRemove = toRemove;
    }

    public Node getOldNode() {
        return oldNode;
    }

    public Node getNewNode() {
        return newNode.getNewNode();
    }

    public List<Node> getToRemove() {
        return toRemove;
    }

}

类NodesToProcess

private void wrapUnwrappedTextInTagP(Element element) {
    List<NodesToProcess> nodesToProcesses = new ArrayList<>();
    List<Node> nodeAlreadyUsed = new ArrayList<>();

    element.childNodes().forEach(node -> {
        if(node instanceof TextNode && !nodeAlreadyUsed.contains(node)) {
            List<Node> newChilds = new ArrayList<>();
            List<Node> toRemove = new ArrayList<>();

            newChilds.add(node);
            nodeAlreadyUsed.add(node);
            Node auxNode = node.nextSibling();

            while (auxNode != null && parentIsBodyAndIsAnTextElement(auxNode)) {
                newChilds.add(auxNode);
                nodeAlreadyUsed.add(auxNode);
                toRemove.add(auxNode);
                auxNode = auxNode.nextSibling();
            }
            nodesToProcesses.add(new NodesToProcess(node, new NewNode(newChilds), toRemove));
        }
    });

    nodesToProcesses.forEach(nodesToProcess -> {
        nodesToProcess.getOldNode().replaceWith(nodesToProcess.getNewNode());
        nodesToProcess.getToRemove().forEach(node -> node.remove());
    });
}

并且此方法是包装未包装的文本的方法

Document doc = Jsoup.parse(html);
wrapUnwrappedTextInTagP(doc.body());

所以,在主要方法

{{1}}