答案 0 :(得分:1)
正如评论中所建议的那样,BeautifulSoup
让它非常简单:
In [2]: from bs4 import BeautifulSoup
In [3]: import urllib2
In [4]: url = "http://www.dlib.org/dlib/november14/brook/11brook.html"
In [5]: soup = BeautifulSoup(urllib2.urlopen(url))
In [6]: for h3 in soup.find_all("h3"):
...: print(h3.text)
...:
D-Lib Magazine
The Social, Political and Legal Aspects of Text and Data Mining (TDM)
Abstract
1. Introduction
2. Copyright, database right, licences and TDM
3. Recent changes to UK law
4. What can politicians and policy makers do?
5. Publishers are not embracing opportunities of TDM
6. How can publishers help TDM researchers?
7. Awareness among academics and a technological gap
8. Conclusion
Notes
References
About the Authors