如何遍历jsoup中的各种元素?

时间:2015-08-20 14:19:44

标签: java android parsing

我必须通过jsoup解析页面。该页面包含一个类和各种元素,其中包含ph1h2h3等标记。我想逐个解析它们,然后处理每一个他们该页面如下:

    <div class="pf-content">
        <p>For centuries, Spain shone and progressed under Muslim rule. Unfortunately, the city of Seville fell prey to the barbaric onslaught of the Kingdom of Castile in the year 1248. Several innocent Spaniards were killed, many were forced to leave their homeland and seek refuge elsewhere, whereas many others were captured and taken as slaves. The rulers of Castile further destroyed remnants of Islamic life and culture, <a href="https://muslimmemo.com/masjids-spain/">including masjids</a>.</p>
        <h3>Original Arabic Text</h3>
        <h4>Original Arabic Text</h4>
    </div>

p,h3,h4等出现的顺序很重要,因为我必须将其解析为android textview。

我能做的是:

Document document = Jsoup.connect("page link here").get();

Elements pTag = document.select("div.pf-content");

但是我该如何从这里开始呢?请帮帮我。

我尝试的是:

Elements elements = document.select("div.pf-content");

            for (Element element : elements) {
                Log.d("FullContent", "elements are: " + element);
                if (element.select("p").first() != null) {
                    Log.d("FullContent", "a p tag");
                    if (element.select("p").first().select("img").first() != null) {
                        Log.d("FullContent", "the tag "  + "has src");
                    }


                } else if (element.select("h1").first() != null) {
                    Log.d("FullContent", "a h1 tag");
                } else if (element.select("h2").first() != null) {
                    Log.d("FullContent", "a h2 tag");
                } else if (element.select("h3").first() != null) {
                    Log.d("FullContent", "a h3 tag");
                } else if (element.select("h4").first() != null) {
                    Log.d("FullContent", "a h4 tag");
                } else {
                    Log.d("FullContent", "other tag");
                }

            }

1 个答案:

答案 0 :(得分:1)

使用Elements找到Elements pTag = document.select("div.pf-content");后,您可以执行以下操作:

Elements elements = pTag.first().children(); for (Element e : elements){ // Do something with each element }