JSoup:如何列出列表中的链接?

时间:2019-01-19 08:56:39

标签: html dom xhtml jsoup element

我如何list链接,但仅来自div标签?更具体地说,仅list中的div?以某种方式将selection限制为特定元素?

代码:

package my.books;

import java.io.File;
import java.net.URI;
import java.util.Properties;
import java.util.logging.Logger;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class App {

    private static final Logger LOG = Logger.getLogger(App.class.getName());
    private Properties properties = new Properties();

    public static void main(String[] args) throws Exception {
        new App().basicJSoup();
    }

    private void basicJSoup() throws Exception {
        properties.loadFromXML(App.class.getResourceAsStream("/properties.xml"));
        LOG.fine(properties.toString());
        URI inputURI = new URI(properties.getProperty("html_input"));
        URI outputURI = new URI(properties.getProperty("output"));

        File input = new File(inputURI);
        Document doc = Jsoup.parse(input, "UTF-8");
        Element sideCategories = doc.select("div.side_categories").first();
        LOG.fine(sideCategories.outerHtml());

        Elements ul = doc.select("div.side_categories > ul");
        Elements li = ul.select("li");

        for (int i = 0; i < li.size(); i++) {
            LOG.info(li.get(i).text());
            LOG.info("i\t\t" + i);
        }
    }

}

1 个答案:

答案 0 :(得分:1)

如果我正确理解了您的问题,则只需要编写完整的特定css选择器即可,例如div.side_categories ul li a

例如:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JSoupTest {
    public static void main(String[] args) {
        String markup =
                "<div class=\"side_categories\">" +
                  "<ul>" +
                    "<li>" +
                      "<a href=\"#\">Link 1</a>" +
                    "</li>" +
                    "<li>" +
                      "<a href=\"#\">Link 2</a>" +
                    "</li>" +
                  "</ul>" +
                "</div>";

        Document doc = Jsoup.parse(markup);
        Elements links = doc.select("div.side_categories ul li a");

        for (Element link : links) {
            System.out.println(link);
        }
    }
}

结果:

<a href="#">Link 1</a>
<a href="#">Link 2</a>