JSoup没有按类正确提取元素

时间:2016-09-17 20:34:38

标签: android html web-scraping jsoup href

我在网页中有以下元素:

<div id="pnNij" class="post" data-tag1="" data-tag2="">
    <a class="image-list-link" href="http://imgur.com/gallery/pnNij" data-page="0">
        <img alt="" src="./Imgur_ The most awesome images on the Internet_files/H7fZCNgb.jpg">


            <div class="point-info gradient-transparent-black transition">
                <div class="relative">
                    <div class="pa-bottom">
                        <div class="arrows">
                            <div title="like" class="pointer arrow-up icon-upvote-outline" data="pnNij" type="image" data-up="4212"></div>
                            <div title="dislike" class="pointer arrow-down icon-downvote-outline" data="pnNij" type="image" data-downs="502"></div>
                            <div class="clear"></div>
                        </div>

                        <div class="point-info-points" title="points">
                            <span class="points-pnNij">3,710</span>
                            <span class="points-text-pnNij">points</span>
                        </div>
                    </div>
                </div>
            </div>

    </a>
    <div class="hover">
                    <p>Seems like 2017 has it all...</p>


        <div class="post-info">
            album · 69,542 views
        </div>
    </div>

</div>

注意href如何等于http://imgur.com/gallery/pnNij

但是,当我使用JSoup从页面中提取元素时:

docImgur = Jsoup.connect("http://imgur.com/").get();
Elements links = docImgur.getElementsByClass("post");

除了href属性等于/ gallery / pnNij /

之外,元素几乎被正确提取

为什么href属性不包含完整的URL?

1 个答案:

答案 0 :(得分:0)

当您检查页面源时,您会发现 &lt; a class =“image-list-link”href =“/ gallery / WRzti”data-page =“0”&gt;     ... &LT; / A&GT; 因此href属性不是绝对的,这会产生预期的结果:/ gallery / WRzti 解 使用abs:属性前缀。 例 文档docImgur = Jsoup.connect(“http://imgur.com/”)。get(); Elements links = docImgur.select(“a [href] .image-list-link”); for(元素:链接){     的System.out.println(element.attr( “ABS:HREF”)); } 产量 http://imgur.com/gallery/WRzti http://imgur.com/gallery/tCnDJ http://imgur.com/gallery/JIHYh ...