如何使用:jsoup中的空伪选择器

时间:2016-08-17 06:53:47

标签: java html jsoup

我想选择没有div或任何其他标签的div标签。 我尝试了下面的代码,我希望输出为"这是输出" 但空的伪选择器不起作用。

String htmlString = 
"<html><div><div><div><p><b>This is first line</b></p>   </div><b>This is second line</b></div><div>This is output</div><div><span style=\"color:blue\">This is third line</span></div></html>"`;

            org.jsoup.nodes.Document doc1 = Jsoup.parse(htmlString);

            Elements elements1 = doc1.select("html:empty");

            for (Element element : elements1) {
                System.out.println(element.toString());
            }

2 个答案:

答案 0 :(得分:1)

由于你最近发布了几个similar questions,你的html结构改变了,css选择器坏了,也许它会更好/更适合你,避免css选择器并自己处理/过滤元素:

String htmlString = "<html><p><b>This has no div</b></p><div><div><div><p><b>This is first line</b></p></div><b>This is second line</b></div><div>This is output</div><div><span style=\"color:blue\">This is third line</span></div></html>";

Document doc = Jsoup.parse(htmlString);

Elements elements = doc.getAllElements();

// for all textnodes
outerloop:
for (Element element : elements) {
    if(element.childNodes().size()>0 && element.childNode(0).nodeName().equals("#text")){
        Element divContent = element;

        if(divContent.nodeName().equals("div")){
            System.out.println("No element in div; text: " + element.text()+ "\n");
        }else{  
            while(divContent.parents().size()>0 && !divContent.parent().nodeName().equals("div")){
                divContent = divContent.parent();
                if(divContent.parent().nodeName().equals("body")){
                    continue outerloop; // continue, to skip element <p><b>This has no div</b></p>
                    //break; // break, if you want the element <p><b>This has no div</b></p> anyway 
                }
            }

            System.out.println("element: " + divContent.toString());
            System.out.println("text: " + element.text() + "\n");
        }
    }
}

// only for <div>text...</div>
for (Element element : elements) {
    if(element.childNodes().size()>0 && element.childNode(0).nodeName().equals("#text") && element.nodeName().equals("div")){
        System.out.println("text: " + element.text());
    }
}

输出:

element: <p><b>This is first line</b></p>
text: This is first line

element: <b>This is second line</b>
text: This is second line

No element in div; text: This is output

element: <span style="color:blue">This is third line</span>
text: This is third line

text: This is output

答案 1 :(得分:0)

我试过这个工作

public class Test{
    public static void main(String[] args) {
        String htmlString =
                "<html>" +
                        "<div><div>" +
                        "<div><p><b>This is first line</b></p>   </div>" +
                        "<b>This is second line</b></div><div>This is output</div>" +
                        "<div><span style=\"color:blue\">This is third line</span></div></html>";

        org.jsoup.nodes.Document doc1 = Jsoup.parse(htmlString);

        for (Element e : doc1.select("div:not(b),div:not(p),div:not(span)"))
            System.out.println(e.ownText());
    }
}

输出:

This is output