如何在JS呈现的页面上使用Jsoup抓取数据

时间:2018-11-27 02:25:14

标签: java jsoup

我目前正在抓取链接,以从该网站访问每个单独的项目:

https://southwesthumane.org/adopt/dogs/

但是此url有多个由JS呈现的页面。 JS网站的源代码如下:

<span id="ContentPlaceHolder_Item3_AdoptionDogs_2_dpDogs"><a href="javascript:__doPostBack(&#39;ctl00$ctl00$ContentPlaceHolder$Item3$AdoptionDogs_2$dpDogs$ctl00$ctl00&#39;,&#39;&#39;)">Previous</a>&nbsp;<a href="javascript:__doPostBack(&#39;ctl00$ctl00$ContentPlaceHolder$Item3$AdoptionDogs_2$dpDogs$ctl01$ctl00&#39;,&#39;&#39;)">1</a>&nbsp;<span>2</span>&nbsp;<a href="javascript:__doPostBack(&#39;ctl00$ctl00$ContentPlaceHolder$Item3$AdoptionDogs_2$dpDogs$ctl01$ctl02&#39;,&#39;&#39;)">3</a>&nbsp;<a href="javascript:__doPostBack(&#39;ctl00$ctl00$ContentPlaceHolder$Item3$AdoptionDogs_2$dpDogs$ctl01$ctl03&#39;,&#39;&#39;)">4</a>&nbsp;<a href="javascript:__doPostBack(&#39;ctl00$ctl00$ContentPlaceHolder$Item3$AdoptionDogs_2$dpDogs$ctl01$ctl04&#39;,&#39;&#39;)">5</a>&nbsp;<a href="javascript:__doPostBack(&#39;ctl00$ctl00$ContentPlaceHolder$Item3$AdoptionDogs_2$dpDogs$ctl02$ctl00&#39;,&#39;&#39;)">Next</a>&nbsp;</span>

现在我仅从第一页抓取数据,而且我也不知道如何访问其余页面从那里抓取数据。

到目前为止,这是我的代码:

        public static void main(String args[]){
        try{
            Document dogs = Jsoup.connect("https://southwesthumane.org/adopt/dogs/").get();
            Elements links_dogs = dogs.select(":containsOwn(Details »)");

            //***********************DOGS*****************************
            for (Element link : links_dogs) {
                String url = "https://southwesthumane.org" + link.attr("href");
                System.out.println("\nurl: " + url);
                try{
                    int index = 0;
                    Document dog = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
                    Elements name = dog.select("h3");
                    Elements description = dog.select("div.Animaldetails");
                    Elements details = dog.select("div.AnimalDetails > strong");
                    Elements img = dog.select("img[src~=.(jpg|jpeg)]");
                    for (Element code : name) {
                        if (index % 2 == 1)
                            System.out.println("Name: " + code.text());
                        index++;
                    }
                    for (Element code : img) {
                        System.out.println("Image: " + code.attr("src"));
                    }
                    for (Element code : description) {
                        System.out.println("Description: " + code.select("p").text());
                    }
                    for (Element code : details) {
                        System.out.println(code.text() + " " + code.nextSibling().toString());
                    }
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

例如,现在有5页,而我仅访问第一页,我想访问其余可用页面。

0 个答案:

没有答案