使用Jsoup Library进行Web抓取以从给定表中获取数据

时间:2016-12-26 19:56:16

标签: java web-scraping jsoup

所以我试图从WebPage中抓取一些数据,但无法这样做。我尝试使用substring()来做这件事,但效率非常低。这是我编写的代码的一部分:

           Elements links;

           Element link;

           String url = "https://www.premierleague.com/tables";

           Document document = Jsoup.connect(url).get();

           links = document.select("table");

           org.jsoup.nodes.Element table = document.select("table").get(0); 

           Elements rows = table.select("tr");

           org.jsoup.nodes.Element row = rows.get(1);

           Elements cols = row.select("td");

有人可以通过同一链接提供一些示例来帮助我吗?

1 个答案:

答案 0 :(得分:3)

    String url = "https://www.premierleague.com/tables";
    Document doc = Jsoup.connect(url).get();
    Element table = doc.select("table").first();
    Iterator<Element> team = table.select("td[class=team]").iterator();
    Iterator<Element> rank = table.select("td[id=tooltip]").iterator();
    Iterator<Element> points = table.select("td[class=points]").iterator();
    System.out.println(team.next().text());
    System.out.println(rank.next().text()); 
    System.out.println(points.next().text());

输出:

ChelseaCHE
1 Previous Position 1
46

修改 回答你的问题:

        System.out.println(team.next().text());
        System.out.println(rank.next().text());
        System.out.println(points.next().text());
        team.next();
        team.next();
        team.next();

        rank.next();
        rank.next();
        rank.next();

        points.next();
        points.next();
        points.next();

        System.out.println(team.next().text());
        System.out.println(rank.next().text());
        System.out.println(points.next().text());

输出:

ChelseaCHE
1 Previous Position 1
46
Tottenham HotspurTOT
5 Previous Position 5
33