我正在尝试解析某些文档站点的侧边栏目录。我只能在深度2之前获得子元素。
public String ExtractHTMLByIDandDomain(String URL,String ID) throws IOException
{
Document doc = Jsoup.connect(URL).get();
Element element=doc.getElementById(ID);
System.out.println(element);
//return element;
}
网站摘录来源:https://docs.microsoft.com/en-us/ef/ef6/
输出:
<nav class="sidebar" id="sidebar" data-bi-name="left toc" role="navigation" aria-label="Main Navigation">
<button class="sidebar-header" type="button" aria-label="Close" data-bi-name="contents-collapse"> <span>Contents</span> <span class="docon docon-navigate-close" aria-hidden="true"></span> </button>
<div id="sidebarContent">
<div class="filterHolder">
</div>
<nav class="toc"></nav>
<div class="pdfDownloadHolder"></div>
</div>
</nav>
预期输出:
<nav class="sidebar" id="sidebar" data-bi-name="left toc" role="navigation" aria-label="Main Navigation">
<button class="sidebar-header" type="button" aria-label="Close" data-bi-name="contents-collapse"> <span>Contents</span> <span class="docon docon-navigate-close" aria-hidden="true"></span> </button>
<div id="sidebarContent">
<div class="filterHolder">
</div>
...
<nav class="toc" role="application" aria-label="Table of Contents" id="filterResults">
<ul role="tree" onclick="msDocs.functions.stopSomePropagation(event, "top")" class="noSibs hideFocus">
<li role="group" aria-expanded="true" aria-label="entity framework " onclick="event.stopPropagation();msDocs.functions.toggleAriaExpanded(this)">
<a onclick="msDocs.functions.stopSomePropagation(event, "left")" tabindex="0" href="/en-us/ef/" data-text="entity framework " class="x-hidden-focus">Entity Framework</a>
<ul role="tree" onclick="msDocs.functions.stopSomePropagation(event, "top")">
...
</li>
</ul>
</nav>