要解析的HTML代码:
<table width="100%" border="0" cellpadding="0" cellspacing="0" class="ms-bottompaging" xmlns:x="http://www.w3.org/2001/XMLSchema" xmlns:d="http://schemas.microsoft.com/" xmlns:asp="http://schemas.microsoft.com/ASPNET/20" xmlns:pcm="urn:PageContentManager" xmlns:ddwrt2="urn:frontpage:internal">
<tbody>
<tr>
<td class="ms-bottompagingline1"><img src="/_images/11/images/blank.gif?rev=40" width="1" height="1" alt="" /></td>
</tr>
<tr>
<td class="ms-bottompagingline2"><img src="/_images/11/images/blank.gif?rev=40" width="1" height="1" alt="" /></td>
</tr>
<tr>
<td class="ms-vb" id="bottomPagingCellWPQ2" align="center">
<table>
<tbody>
<tr>
<td class="ms-paging">1 - 15</td>
<td><a onclick="javascript:RefreshPageTo(event, "/sites/myAppDetail/My%20Documents/Forms/AllApplicationss.aspx?Paged=TRUE&p_SortBehavior=0&p_FileLeafRef=LT%5fSW%20TEAM%5fNatural%5fItemCode%5f20170909%5fvstatus%2epdf&p_ID=85&RootFolder=%2fmyData%2fFolder3%2fCommon%20Docs%2fdaily%20Report%2f2017&PageFirstRow=16&&View={05465DFA-110E-21FC-8AD6-8B9846567FF8B}");javascript:return false;" href="javascript:"><img src="/_layouts/15/1011/images/next.gif" border="0" alt="Next" /></a></td>
</tr>
</tbody>
</table></td>
</tr>
<tr>.......
如何从上面的html代码中获取<a onClick="..">
的值。
预期产出:
"/sites/myAppDetail/My%20Documents/Forms/AllApplicationss.aspx?Paged=TRUE&p_SortBehavior=0&p_FileLeafRef=LT%5fSW%20TEAM%5fNatural%5fItemCode%5f20170909%5fvstatus%2epdf&p_ID=85&RootFolder=%2fmyData%2fFolder3%2fCommon%20Docs%2fdaily%20Report%2f2017&PageFirstRow=16&&View={05465DFA-110E-21FC-8AD6-8B9846567FF8B}"
我尝试使用以下代码,但输出不符合预期。
File input = new File("myHtml.html");
Document doc = Jsoup.parse(input, "UTF-8");
Elements links = doc.select(".ms-paging > td > a"); //get the value stored inside <a onClick="javascript:RefreshPageTo(event, "...)"> near <td class="ms-paging">1 - 15</td>;
System.out.println("size : "+ links.size()); //0
for (Element link : links) {
System.out.println(link);//empty, it should print the link
}
答案 0 :(得分:0)
您需要使用~
指定td
旁边的td class="ms-paging"
元素。以下为我工作
Document doc = Jsoup.parse(input, "UTF-8");
Elements elements = doc.select("td.ms-paging ~ td > a") ;
for(Element e : elements) {
String attrValue = e.attr("onclick");
System.out.println(attrValue.substring(attrValue.indexOf("\"") + 1,
attrValue.lastIndexOf("\"")));
}
将打印预期值
/sites/myAppDetail/My%20Documents/Forms/AllApplicationss.aspx?Paged=TRUE&p_SortBehavior=0&p_FileLeafRef=LT%5fSW%20TEAM%5fNatural%5fItemCode%5f20170909%5fvstatus%2epdf&p_ID=85&RootFolder=%2fmyData%2fFolder3%2fCommon%20Docs%2fdaily%20Report%2f2017&PageFirstRow=16&&View={05465DFA-110E-21FC-8AD6-8B9846567FF8B}
希望它有所帮助!