我尝试使用Jsoup从上面的页面中获取由ul
呈现的所有html列表。
这是我的代码:
Document doc = Jsoup.connect("http://www.unc.edu/academics/").get();
Elements lists = doc.select("ul");
for (Element list: lists) {
Elements li = list.select("li a");
if (li.size() > 0) {
ArrayList<String> anchors = new ArrayList<String>();
for (Element e : li) {
anchors.add(e.text());
}
System.out.println(anchors);
}
}
以下是输出:
[Calendar, Libraries, Maps, Departments, MyUNC]
[About UNC, Academics, Research, Public Service, Health Care, UNC Global, Arts, Athletics]
[Academic Departments, Continuing Education, Distance Education, Provost, Services and Resources]
[Academic Calendar, Courses, Libraries, Registrar, Sakai]
[College of Arts & Sciences, Dentistry, Education, Eshelman School of Pharmacy, Friday Center for Continuing Education, General College, Gillings School of Global Public Health, Graduate School, Kenan-Flagler Business School, Government, Information & Library Science, Journalism & Mass Communication, Law, Medicine, Nursing, Social Work, Summer School]
[Departments A-Z, Departments by Interest Area]
[American Indian Studies, APPLES Service-Learning, Applied Sciences & Engineering, Archaeology, Bioinformatics & Computational Biology Training, Biological & Biomedical Sciences, Burch Fellows, Business (Undergraduate), Carolina Entrepreneurial Initiative, Christianity & Culture, Cinema, Cognitive Science, Comparative Literature, Communication Studies, Creative Writing, Cultural Studies, Developmental Biology Training, Ethnicity, Culture & Health Outcomes, Environment & Ecology, European Studies, First Year Seminars, Folklore, Genetics & Molecular Biology, Global Studies, Honors, Humanities & Human Values, Institute for Environment, Jewish Studies, Johnston Center for Undergraduate Excellence, Languages Across Curriculum, Latin American Studies, Latina/o Studies, Management & Society, Mathematical Decision Sciences, Mathematical Sciences, Medieval & Early Modern Studies, Middle East/Muslim Civilizations, Molecular Biology & Biotechnology, Molecular/Cellular Biophysics, Morehead-Cain Scholarship, Neurobiology, Peace, War & Defense, Philosophy, Politics & Economics, Program on Health Outcomes, Public Administration, Public Health Leadership, Russian/East European Studies, Robertson Scholars, Sexuality Studies, Social & Economic Justice, SPIRE Postdoctoral Program, Stone Center, Study Abroad, SURE, Toxicology, Transatlantic Master’s Program, Undergraduate Curricula, World View, Writing for Screen & Stage]
[Alert Carolina, Contact, Departments, Directory, Employment, FAQs, ITS, Privacy Policy, Accessibility, RSS Feeds]
您可能会注意到,下图中显示的三个列表正在合并为一个,即输出中的第五个列表。
正如您在页面源中看到的那样,这三个列表确实由三个ul
标记呈现。它可能与页面中嵌入的Javascript或CSS有关吗?
答案 0 :(得分:4)
源代码确实将列表整合在一起。
<ul class="col3">
<li><a href="http://artsandsci.unc.edu/">College of Arts & Sciences</a></li>
<li><a href="http://www.dentistry.unc.edu/">Dentistry</a></li>
<li><a href="http://soe.unc.edu/">Education</a></li>
<li><a href="http://www.pharmacy.unc.edu/">Eshelman School of Pharmacy</a></li>
<li><a href="http://www.fridaycenter.unc.edu/">Friday Center for Continuing Education</a></li>
<li><a href="http://advising.unc.edu/">General College</a></li>
<li><a href="http://www.sph.unc.edu/">Gillings School of Global Public Health</a></li>
<li><a href="http://gradschool.unc.edu/">Graduate School</a></li>
<li><a href="http://www.kenan-flagler.unc.edu/">Kenan-Flagler Business School</a></li>
<li><a href="http://www.sog.unc.edu/">Government</a></li>
<li><a href="http://sils.unc.edu/">Information & Library Science</a></li>
<li><a href="http://www.jomc.unc.edu/">Journalism & Mass Communication</a></li>
<li><a href="http://www.law.unc.edu/">Law</a></li>
<li><a href="http://www.med.unc.edu/">Medicine</a></li>
<li><a href="http://nursing.unc.edu/">Nursing</a></li>
<li><a href="http://ssw.unc.edu/">Social Work</a></li>
<li><a href="http://summer.unc.edu/">Summer School</a></li>
</ul>
但是javascript将它分成三个独立的<ul>
。
jQuery(document).ready(function($) {
$('div.accordion > ul').makeacolumnlists({
cols: 3,
colWidth: '33%',
equalHeight: false,
startN: 1
});
$('div.accordion > div > ul').accordion({
autoHeight: false,
header:'> li > h4',
collapsible: true,
active: false
});
$('ul.col2').makeacolumnlists({
cols: 2,
colWidth: 0,
equalHeight: false,
startN: 1
});
$('ul.col3').makeacolumnlists({
cols: 3,
colWidth: 0,
equalHeight: false,
startN: 1
});
});
诀窍。
答案 1 :(得分:1)
<ul>
标记。它使用CSS使其显示为3列(class="col3"
)
我认为如果Chrome提供的信息不正确,可能是Javascript搞砸了你。
<ul class="col3">
<li><a href="http://artsandsci.unc.edu/">College of Arts & Sciences</a></li>
<li><a href="http://www.dentistry.unc.edu/">Dentistry</a></li>
<li><a href="http://soe.unc.edu/">Education</a></li>
<li><a href="http://www.pharmacy.unc.edu/">Eshelman School of Pharmacy</a></li>
<li><a href="http://www.fridaycenter.unc.edu/">Friday Center for Continuing Education</a></li>
<li><a href="http://advising.unc.edu/">General College</a></li>
<li><a href="http://www.sph.unc.edu/">Gillings School of Global Public Health</a></li>
<li><a href="http://gradschool.unc.edu/">Graduate School</a></li>
<li><a href="http://www.kenan-flagler.unc.edu/">Kenan-Flagler Business School</a></li>
<li><a href="http://www.sog.unc.edu/">Government</a></li>
<li><a href="http://sils.unc.edu/">Information & Library Science</a></li>
<li><a href="http://www.jomc.unc.edu/">Journalism & Mass Communication</a></li>
<li><a href="http://www.law.unc.edu/">Law</a></li>
<li><a href="http://www.med.unc.edu/">Medicine</a></li>
<li><a href="http://nursing.unc.edu/">Nursing</a></li>
<li><a href="http://ssw.unc.edu/">Social Work</a></li>
<li><a href="http://summer.unc.edu/">Summer School</a></li>
</ul>