Jsoup选择器解析元素的href和标题

时间:2018-06-23 02:10:50

标签: jsoup

古腾堡(Gutenberg)的HTML:

<!-- JAVASCRIPT LIBRARY CONFIGURATIONS -->
<script src="js/scripts/lib/uri-controler.js"></script>
<script src="js/scripts/lib/webstorage-controler.js"></script>

<script>
    console.log(URIControler.URN)
</script>

html from gutenberg

我想这样解析HTML,并使用JSoup获取<li class="booklink"> <a class="table link" href="/ebooks/4300.mobile" accesskey="5"> <span class="row"> <span class="cell leftcell"> <span class="icon icon_book"></span> </span> <span class="cell content"> <span class="title">Ulysses</span> <span class="subtitle">James Joyce</span> <span class="extra">7824 downloads</span> </span> <span class="cell rightcell"> <span class="icon icon_next"></span> </span> </span> </a> </li> 链接和href

我尝试了很多方法,但是没有成功。

2 个答案:

答案 0 :(得分:0)

   Document doc = Jsoup.connect("https://www.gutenberg.org/ebooks/search/?sort_order=downloads").get();


   //select tags with class name link, that has parent tag with class booklink
   for(Element e: doc.select(".booklink > .link"))
   {
       //for selected tag select element that has class title
       System.out.println("title: "+ e.select(".title").text());
       //for selected tag select attribute href and resolve absolute url
       System.out.println("url: "+ e.attr("abs:href"));
   }

答案 1 :(得分:0)

尝试一下:

import java.io.IOException;
import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class BookScraper {

    public static void main(String[] args) throws IOException {

        Document document = Jsoup.connect("https://m.gutenberg.org/ebooks/search.mobile/?query=ulysses").get();
        List<Element> bookLinks = document.select("body > div.content > ol > li[class=booklink]");

        for (Element bookLink : bookLinks) {

            String href = bookLink.select(".table.link").get(0).absUrl("href");
            String title = bookLink.select(".cell.content .title").text();
            String subTitle = bookLink.select(".cell.content .subtitle").text();
            String extra = bookLink.select(".cell.content .extra").text();

            System.out.println("Link : " + href);
            System.out.println("    Title    : " + title);
            System.out.println("    Subtitle : " + subTitle);
            System.out.println("    Info     : " + extra);
        }
    }

}

样本输出:

Link : https://m.gutenberg.org/ebooks/4300.mobile
    Title    : Ulysses
    Subtitle : James Joyce
    Info     : 7824 downloads
Link : https://m.gutenberg.org/ebooks/4367.mobile
    Title    : Personal Memoirs of U. S. Grant, Complete
    Subtitle : Ulysses S. Grant
    Info     : 1459 downloads
Link : https://m.gutenberg.org/ebooks/20151.mobile
    Title    : Hidden Treasures; Or, Why Some Succeed While Others Fail
    Subtitle : Harry A. Lewis
    Info     : 199 downloads
Link : https://m.gutenberg.org/ebooks/32884.mobile
    Title    : Ideas of Good and Evil
    Subtitle : W. B. Yeats
    Info     : 143 downloads
Link : https://m.gutenberg.org/ebooks/35742.mobile
    Title    : American Leaders and Heroes: A preliminary text-book in United States History
    Subtitle : Wilbur F. Gordy
    Info     : 143 downloads
Link : https://m.gutenberg.org/ebooks/32326.mobile
    Title    : Tales of Troy and Greece
    Subtitle : Andrew Lang
    Info     : 118 downloads
Link : https://m.gutenberg.org/ebooks/7768.mobile
    Title    : The Adventures of Ulysses
    Subtitle : Charles Lamb
    Info     : 108 downloads
Link : https://m.gutenberg.org/ebooks/11490.mobile
    Title    : American Negro Slavery
    Subtitle : Ulrich Bonnell Phillips
    Info     : 102 downloads
Link : https://m.gutenberg.org/ebooks/17667.mobile
    Title    : Dialogues of the Dead
    Subtitle : Baron George Lyttelton Lyttelton and Mrs. Montagu
    Info     : 98 downloads
Link : https://m.gutenberg.org/ebooks/2851.mobile
    Title    : Sixes and Sevens
    Subtitle : O. Henry
    Info     : 97 downloads
Link : https://m.gutenberg.org/ebooks/32728.mobile
    Title    : The English in the West Indies; Or, The Bow of Ulysses
    Subtitle : James Anthony Froude
    Info     : 69 downloads
Link : https://m.gutenberg.org/ebooks/41935.mobile
    Title    : The Adventures of Ulysses the Wanderer
    Subtitle : Homer and Guy Thorne
    Info     : 67 downloads
Link : https://m.gutenberg.org/ebooks/32628.mobile
    Title    : The Child's Book of American Biography
    Subtitle : Mary Stoyell Stimpson
    Info     : 63 downloads
Link : https://m.gutenberg.org/ebooks/29659.mobile
    Title    : Manual of American Grape-Growing
    Subtitle : U. P. Hedrick
    Info     : 54 downloads
Link : https://m.gutenberg.org/ebooks/46327.mobile
    Title    : The Cherries of New York
    Subtitle : U. P. Hedrick
    Info     : 47 downloads
Link : https://m.gutenberg.org/ebooks/5860.mobile
    Title    : Personal Memoirs of U. S. Grant, Part 1.
    Subtitle : Ulysses S. Grant
    Info     : 46 downloads
Link : https://m.gutenberg.org/ebooks/51076.mobile
    Title    : Aaron Rodd, Diviner
    Subtitle : E. Phillips Oppenheim
    Info     : 34 downloads
Link : https://m.gutenberg.org/ebooks/45978.mobile
    Title    : The Grapes of New York
    Subtitle : U. P. Hedrick
    Info     : 33 downloads
Link : https://m.gutenberg.org/ebooks/46347.mobile
    Title    : Men of Our Times; Or, Leading Patriots of the Day
    Subtitle : Harriet Beecher Stowe
    Info     : 31 downloads
Link : https://m.gutenberg.org/ebooks/4546.mobile
    Title    : Memoirs of the Union's Three Great Civil War Generals
    Subtitle : Ulysses S. Grant, William T. Sherman, and Philip Henry Sheridan
    Info     : 30 downloads
Link : https://m.gutenberg.org/ebooks/47263.mobile
    Title    : The Peaches of New York
    Subtitle : U. P. Hedrick
    Info     : 30 downloads
Link : https://m.gutenberg.org/ebooks/39626.mobile
    Title    : An Alphabet of History
    Subtitle : Wilbur D. Nesbit
    Info     : 28 downloads
Link : https://m.gutenberg.org/ebooks/46994.mobile
    Title    : The Pears of New York
    Subtitle : U. P. Hedrick
    Info     : 27 downloads
Link : https://m.gutenberg.org/ebooks/43982.mobile
    Title    : Stories of the Old World
    Subtitle : Alfred John Church
    Info     : 26 downloads
Link : https://m.gutenberg.org/ebooks/28386.mobile
    Title    : Ulysses S. Grant
    Subtitle : Walter Allen
    Info     : 25 downloads