HtmlUnit使表处于循环中,但不是第二次通过

时间:2018-08-19 05:19:34

标签: java htmlunit

我正在使用HtmlUnit解析网页。该网页上有很多输入,我可以通过编程设置这些输入,然后单击“提交”按钮。这将在输入下方的同一页上返回分析结果。

解析器第一次在循环中运行良好,但第二次却没有。这是代码:

public void getPortfolioVisualizerData(List<String>symbols) throws Exception {
        final WebClient webClient = new WebClient();
        final HtmlPage page = webClient.getPage("https://www.portfoliovisualizer.com/backtest-portfolio#analysisResults");
        HtmlForm form = page.getFirstByXPath("//form[@action='backtest-portfolio#analysisResults']");

        //Time Period combobox
        HtmlSelect select = (HtmlSelect) page.getElementById("timePeriod");
        HtmlOption option = select.getOptionByValue("4");   
        select.setSelectedAttribute(option, true);

        //Start Year combobox
        select = (HtmlSelect) page.getElementById("startYear");
        option = select.getOptionByValue("1985");  
        select.setSelectedAttribute(option, true);

        //End Year combobox
        select = (HtmlSelect) page.getElementById("endYear");
        option = select.getOptionByValue("2018");  
        select.setSelectedAttribute(option, true);

        //Initial Amount text input
        HtmlTextInput textField = form.getInputByName("initialAmount");
        textField.type("10000");

        //Periodic Adjustment combobox
        select = (HtmlSelect) page.getElementById("annualOperation");
        option = select.getOptionByValue("0");  
        select.setSelectedAttribute(option, true);

        //Rebalancing combobox
        select = (HtmlSelect) page.getElementById("rebalanceType");
        option = select.getOptionByValue("1");  
        select.setSelectedAttribute(option, true);

        //Display Income combobox
        select = (HtmlSelect) page.getElementById("showYield");
        option = select.getOptionByValue("false");  
        select.setSelectedAttribute(option, true);

        //Benchmark combobox
        select = (HtmlSelect) page.getElementById("benchmark");
        option = select.getOptionByValue("VFINX");  
        select.setSelectedAttribute(option, true);

        //Allocation 1 text input
        textField = form.getInputByName("allocation1_1");
        textField.type("100");
        HtmlSubmitInput button = (HtmlSubmitInput)page.getElementById("submitButton");
        Data data = new Data();

        for (String symbol:symbols) {
            //Asset 1 text input
            textField = form.getInputByName("symbol1");
            textField.type(symbol);

            // Now submit the form by clicking the Analyze Portfolios button and get back the second page.
            HtmlPage page2 = button.click();
            HtmlTable table = (HtmlTable) page2.getByXPath("//table[@class='table table-striped table-condensed']").get(1);   //the second table on the page
            int rowNum = 0;
            for (HtmlTableRow row : table.getRows()) {
                rowNum++;
                if (rowNum==1) continue;    //skip table header values
                int colNum = 0;
                for (HtmlTableCell cell : row.getCells()) {
                    colNum++;
                    if (rowNum==2) {
                        data.Symbol = symbol;
                        String val = cell.asText();
                        switch(colNum) {
                            case 4:  data.CAGR               = val.replace("%", ""); break;
                            case 5:  data.StdDev             = val.replace("%", ""); break;
                            case 6:  data.BestYear           = val.replace("%", ""); break;
                            case 7:  data.WorstYear          = val.replace("%", ""); break;
                            case 8:  data.MaxDrawdown        = val.replace("%", ""); break;
                            case 9:  data.SharpRatio         = val;                  break;
                            case 10: data.SortinoRatio       = val;                  break;
                            case 11: data.CorrelationToUsMkt = val;
                        }
                    }

            }
            saveStock(data);
            button = (HtmlSubmitInput)page2.getElementById("submitButton");
            form = page2.getFirstByXPath("//form[@action='backtest-portfolio#analysisResults']");
       }
    }

它给我一个java.lang.IndexOutOfBoundsException:索引:1,大小:0在此行:

HtmlTable table = (HtmlTable) page2.getByXPath("//table[@class='table table-striped table-condensed']").get(1);   //the second table on the page

感兴趣的表是页面上的第二个表,但是错误似乎表明它没有在循环的第二个遍历中找到任何表。为什么不?如果我手动输入第二个符号,它将返回感兴趣的表。

1 个答案:

答案 0 :(得分:0)

我认为您应该在从XPath获取表之前和单击之后添加延迟。可能会在第二页加载之前尝试。