古腾堡(Gutenberg)的HTML:
<!-- JAVASCRIPT LIBRARY CONFIGURATIONS -->
<script src="js/scripts/lib/uri-controler.js"></script>
<script src="js/scripts/lib/webstorage-controler.js"></script>
<script>
console.log(URIControler.URN)
</script>
我想这样解析HTML,并使用JSoup获取<li class="booklink">
<a class="table link" href="/ebooks/4300.mobile" accesskey="5">
<span class="row">
<span class="cell leftcell">
<span class="icon icon_book"></span>
</span>
<span class="cell content">
<span class="title">Ulysses</span>
<span class="subtitle">James Joyce</span>
<span class="extra">7824 downloads</span>
</span>
<span class="cell rightcell">
<span class="icon icon_next"></span>
</span>
</span>
</a>
</li>
链接和href
。
我尝试了很多方法,但是没有成功。
答案 0 :(得分:0)
Document doc = Jsoup.connect("https://www.gutenberg.org/ebooks/search/?sort_order=downloads").get();
//select tags with class name link, that has parent tag with class booklink
for(Element e: doc.select(".booklink > .link"))
{
//for selected tag select element that has class title
System.out.println("title: "+ e.select(".title").text());
//for selected tag select attribute href and resolve absolute url
System.out.println("url: "+ e.attr("abs:href"));
}
答案 1 :(得分:0)
尝试一下:
import java.io.IOException;
import java.util.List;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class BookScraper {
public static void main(String[] args) throws IOException {
Document document = Jsoup.connect("https://m.gutenberg.org/ebooks/search.mobile/?query=ulysses").get();
List<Element> bookLinks = document.select("body > div.content > ol > li[class=booklink]");
for (Element bookLink : bookLinks) {
String href = bookLink.select(".table.link").get(0).absUrl("href");
String title = bookLink.select(".cell.content .title").text();
String subTitle = bookLink.select(".cell.content .subtitle").text();
String extra = bookLink.select(".cell.content .extra").text();
System.out.println("Link : " + href);
System.out.println(" Title : " + title);
System.out.println(" Subtitle : " + subTitle);
System.out.println(" Info : " + extra);
}
}
}
样本输出:
Link : https://m.gutenberg.org/ebooks/4300.mobile
Title : Ulysses
Subtitle : James Joyce
Info : 7824 downloads
Link : https://m.gutenberg.org/ebooks/4367.mobile
Title : Personal Memoirs of U. S. Grant, Complete
Subtitle : Ulysses S. Grant
Info : 1459 downloads
Link : https://m.gutenberg.org/ebooks/20151.mobile
Title : Hidden Treasures; Or, Why Some Succeed While Others Fail
Subtitle : Harry A. Lewis
Info : 199 downloads
Link : https://m.gutenberg.org/ebooks/32884.mobile
Title : Ideas of Good and Evil
Subtitle : W. B. Yeats
Info : 143 downloads
Link : https://m.gutenberg.org/ebooks/35742.mobile
Title : American Leaders and Heroes: A preliminary text-book in United States History
Subtitle : Wilbur F. Gordy
Info : 143 downloads
Link : https://m.gutenberg.org/ebooks/32326.mobile
Title : Tales of Troy and Greece
Subtitle : Andrew Lang
Info : 118 downloads
Link : https://m.gutenberg.org/ebooks/7768.mobile
Title : The Adventures of Ulysses
Subtitle : Charles Lamb
Info : 108 downloads
Link : https://m.gutenberg.org/ebooks/11490.mobile
Title : American Negro Slavery
Subtitle : Ulrich Bonnell Phillips
Info : 102 downloads
Link : https://m.gutenberg.org/ebooks/17667.mobile
Title : Dialogues of the Dead
Subtitle : Baron George Lyttelton Lyttelton and Mrs. Montagu
Info : 98 downloads
Link : https://m.gutenberg.org/ebooks/2851.mobile
Title : Sixes and Sevens
Subtitle : O. Henry
Info : 97 downloads
Link : https://m.gutenberg.org/ebooks/32728.mobile
Title : The English in the West Indies; Or, The Bow of Ulysses
Subtitle : James Anthony Froude
Info : 69 downloads
Link : https://m.gutenberg.org/ebooks/41935.mobile
Title : The Adventures of Ulysses the Wanderer
Subtitle : Homer and Guy Thorne
Info : 67 downloads
Link : https://m.gutenberg.org/ebooks/32628.mobile
Title : The Child's Book of American Biography
Subtitle : Mary Stoyell Stimpson
Info : 63 downloads
Link : https://m.gutenberg.org/ebooks/29659.mobile
Title : Manual of American Grape-Growing
Subtitle : U. P. Hedrick
Info : 54 downloads
Link : https://m.gutenberg.org/ebooks/46327.mobile
Title : The Cherries of New York
Subtitle : U. P. Hedrick
Info : 47 downloads
Link : https://m.gutenberg.org/ebooks/5860.mobile
Title : Personal Memoirs of U. S. Grant, Part 1.
Subtitle : Ulysses S. Grant
Info : 46 downloads
Link : https://m.gutenberg.org/ebooks/51076.mobile
Title : Aaron Rodd, Diviner
Subtitle : E. Phillips Oppenheim
Info : 34 downloads
Link : https://m.gutenberg.org/ebooks/45978.mobile
Title : The Grapes of New York
Subtitle : U. P. Hedrick
Info : 33 downloads
Link : https://m.gutenberg.org/ebooks/46347.mobile
Title : Men of Our Times; Or, Leading Patriots of the Day
Subtitle : Harriet Beecher Stowe
Info : 31 downloads
Link : https://m.gutenberg.org/ebooks/4546.mobile
Title : Memoirs of the Union's Three Great Civil War Generals
Subtitle : Ulysses S. Grant, William T. Sherman, and Philip Henry Sheridan
Info : 30 downloads
Link : https://m.gutenberg.org/ebooks/47263.mobile
Title : The Peaches of New York
Subtitle : U. P. Hedrick
Info : 30 downloads
Link : https://m.gutenberg.org/ebooks/39626.mobile
Title : An Alphabet of History
Subtitle : Wilbur D. Nesbit
Info : 28 downloads
Link : https://m.gutenberg.org/ebooks/46994.mobile
Title : The Pears of New York
Subtitle : U. P. Hedrick
Info : 27 downloads
Link : https://m.gutenberg.org/ebooks/43982.mobile
Title : Stories of the Old World
Subtitle : Alfred John Church
Info : 26 downloads
Link : https://m.gutenberg.org/ebooks/28386.mobile
Title : Ulysses S. Grant
Subtitle : Walter Allen
Info : 25 downloads