我正在尝试从以下网址获取表数据:
然后我在jaunt API的帮助下编写了这段代码
package org.open.browser;
import com.jaunt.Element;
import com.jaunt.Elements;
import com.jaunt.JauntException;
import com.jaunt.UserAgent;
public class ICICIScraperDemo {
public static void main(String ar[]) throws JauntException{
UserAgent userAgent = new UserAgent(); //create new userAgent (headless browser)
userAgent.visit("https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec");
Elements links = userAgent.doc.findEvery("<div class=expander>").findEvery("<a>"); //find search result links
String url = null;
for(Element link : links) {
if(link.innerHTML().equalsIgnoreCase("Company Details")){
url = link.getAt("href");
}
}
/*userAgent = new UserAgent(); */ //create new userAgent (headless browser)
userAgent.visit(url);
System.out.println(userAgent.getSource());
Elements results = userAgent.doc.findEvery("<tr>").findEvery("<td>");
System.out.println(results);
}
}
但这没用。
然后我尝试了另一个名为htmlunit
的API,并在下面的代码中编写了
public void htmlUnitEx(){
String START_URL = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try {
WebClient webClient = new WebClient(BrowserVersion.CHROME);
HtmlPage page = webClient.getPage(START_URL);
WebResponse webres = page.getWebResponse();
//List<HtmlAnchor> companyInfo = (List) page.getByXPath("//input[@id='txtStockCode']");
HtmlTable companyInfo = (HtmlTable) page.getFirstByXPath("//table");
for(HtmlTableRow item : companyInfo.getBodies().get(0).getRows()){
String label = item.getCell(1).asText();
System.out.println(label);
if(!label.contains("Registered Office")){
continue ;
}
}
}
但这也没有给出结果。
有人可以在单个会话中帮助您从上述网址和其他锚定网址中获取数据吗?
答案 0 :(得分:0)
使用HtmlUnit,您可以执行此操作
String url = "https://www.icicidirect.com/idirectcontent/Research/TechnicalAnalysis.aspx/companyprofile/inftec";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(1000);
final DomNodeList<DomNode> divs = page.querySelectorAll("div.bigcoll");
System.out.println(divs.get(1).asText());
}
两点要提: