我正在使用HtmlUnit解析网页。该网页上有很多输入,我可以通过编程设置这些输入,然后单击“提交”按钮。这将在输入下方的同一页上返回分析结果。
解析器第一次在循环中运行良好,但第二次却没有。这是代码:
public void getPortfolioVisualizerData(List<String>symbols) throws Exception {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("https://www.portfoliovisualizer.com/backtest-portfolio#analysisResults");
HtmlForm form = page.getFirstByXPath("//form[@action='backtest-portfolio#analysisResults']");
//Time Period combobox
HtmlSelect select = (HtmlSelect) page.getElementById("timePeriod");
HtmlOption option = select.getOptionByValue("4");
select.setSelectedAttribute(option, true);
//Start Year combobox
select = (HtmlSelect) page.getElementById("startYear");
option = select.getOptionByValue("1985");
select.setSelectedAttribute(option, true);
//End Year combobox
select = (HtmlSelect) page.getElementById("endYear");
option = select.getOptionByValue("2018");
select.setSelectedAttribute(option, true);
//Initial Amount text input
HtmlTextInput textField = form.getInputByName("initialAmount");
textField.type("10000");
//Periodic Adjustment combobox
select = (HtmlSelect) page.getElementById("annualOperation");
option = select.getOptionByValue("0");
select.setSelectedAttribute(option, true);
//Rebalancing combobox
select = (HtmlSelect) page.getElementById("rebalanceType");
option = select.getOptionByValue("1");
select.setSelectedAttribute(option, true);
//Display Income combobox
select = (HtmlSelect) page.getElementById("showYield");
option = select.getOptionByValue("false");
select.setSelectedAttribute(option, true);
//Benchmark combobox
select = (HtmlSelect) page.getElementById("benchmark");
option = select.getOptionByValue("VFINX");
select.setSelectedAttribute(option, true);
//Allocation 1 text input
textField = form.getInputByName("allocation1_1");
textField.type("100");
HtmlSubmitInput button = (HtmlSubmitInput)page.getElementById("submitButton");
Data data = new Data();
for (String symbol:symbols) {
//Asset 1 text input
textField = form.getInputByName("symbol1");
textField.type(symbol);
// Now submit the form by clicking the Analyze Portfolios button and get back the second page.
HtmlPage page2 = button.click();
HtmlTable table = (HtmlTable) page2.getByXPath("//table[@class='table table-striped table-condensed']").get(1); //the second table on the page
int rowNum = 0;
for (HtmlTableRow row : table.getRows()) {
rowNum++;
if (rowNum==1) continue; //skip table header values
int colNum = 0;
for (HtmlTableCell cell : row.getCells()) {
colNum++;
if (rowNum==2) {
data.Symbol = symbol;
String val = cell.asText();
switch(colNum) {
case 4: data.CAGR = val.replace("%", ""); break;
case 5: data.StdDev = val.replace("%", ""); break;
case 6: data.BestYear = val.replace("%", ""); break;
case 7: data.WorstYear = val.replace("%", ""); break;
case 8: data.MaxDrawdown = val.replace("%", ""); break;
case 9: data.SharpRatio = val; break;
case 10: data.SortinoRatio = val; break;
case 11: data.CorrelationToUsMkt = val;
}
}
}
saveStock(data);
button = (HtmlSubmitInput)page2.getElementById("submitButton");
form = page2.getFirstByXPath("//form[@action='backtest-portfolio#analysisResults']");
}
}
它给我一个java.lang.IndexOutOfBoundsException:索引:1,大小:0在此行:
HtmlTable table = (HtmlTable) page2.getByXPath("//table[@class='table table-striped table-condensed']").get(1); //the second table on the page
感兴趣的表是页面上的第二个表,但是错误似乎表明它没有在循环的第二个遍历中找到任何表。为什么不?如果我手动输入第二个符号,它将返回感兴趣的表。
答案 0 :(得分:0)
我认为您应该在从XPath获取表之前和单击之后添加延迟。可能会在第二页加载之前尝试。