JSoup从Document中获取数据时返回IndexOutOfBoundsException

时间:2017-11-07 19:44:36

标签: java parsing jsoup

我很难解决我得到的错误!简而言之,我试图从HTML中的表格中获取特定元素!容易吗?那就是我的想法..基本上,如果我从浏览器中复制确切的HTML页面源并从文件中读取它,我就能找到我需要的元素。

但是,当通过document.connect(" URL")阅读文档时,我收到错误!我现在已经在这里待了大约4个小时,四处读书,试图了解发生了什么。我对JSoup相当自信,但这让我很难过!代码如下:

private String parseKcal( Element kCalElement ) throws IOException {

//Getting error on below line
    String calories = kCalElement.select(".tableWrapper").select("tr").select(".tableRow0").select("td").get(0).text();
    if (calories == null) {
        throw new IOException();
    }
    return calories.toString();
}

参数kCalElement是我试图从中获取元素的文档!

****错误****

Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at models.Product.parseKcal(Product.java:67)

****我试图解析****的HTML

    <div class="tableWrapper">
    <table class="nutritionTable">
    <thead>
    <tr class="tableTitleRow">
    <th scope="col">Typical Values</th><th scope="col">Per 100g&nbsp;</th><th scope="col">% based on RI for Average Adult</th>
    </tr>
    </thead>
    <tr class="tableRow1">
    <th scope="row" class="rowHeader" rowspan="2">Energy</th><td class="tableRow1">140kJ</td><td class="tableRow1">-</td>
    </tr>
    <tr class="tableRow0">
    <td class="nutritionLevel1">33kcal</td><td class="nutritionLevel1">2%</td>
    </tr>
    <tr class="tableRow1">
    <th scope="row" class="rowHeader">Fat</th><td class="nutritionLevel1">&lt;0.5g</td><td class="nutritionLevel1">-</td>
    </tr>
    <tr class="tableRow0">
    <th scope="row" class="rowHeader">Saturates</th><td class="nutritionLevel1">&lt;0.1g</td><td class="nutritionLevel1">-</td>
    </tr>
    <tr class="tableRow1">
    <th scope="row" class="rowHeader">Carbohydrate</th><td class="tableRow1">6.1g</td><td class="tableRow1">2%</td>
    </tr>
    <tr class="tableRow0">
    <th scope="row" class="rowHeader">Total Sugars</th><td class="nutritionLevel2">6.1g</td><td class="nutritionLevel2">7%</td>
    </tr>
    <tr class="tableRow1">
    <th scope="row" class="rowHeader">Fibre</th><td class="tableRow1">1.0g</td><td class="tableRow1">-</td>
    </tr>
    <tr class="tableRow0">
    <th scope="row" class="rowHeader">Protein</th><td class="tableRow0">0.6g</td><td class="tableRow0">1%</td>
    </tr>
    <tr class="tableRow1">
    <th scope="row" class="rowHeader">Salt</th><td class="nutritionLevel1">&lt;0.01g</td><td class="nutritionLevel1">-</td>
    </tr>
    </table>
    </div>
    <p>RI= Reference Intakes of an average adult (8400kJ / 2000kcal)</p>
    </div>
    </div>

然而,当我将html粘贴到字符串中时,这不起作用,它可以工作! 见下文:

 File input = new File("~/Desktop/file.html");
    Document doc = Jsoup.parse(input, "UTF-8", "");



    Document document = Jsoup.parse(doc.toString());

    String calories = document.select(".tableWrapper").select("tr").select(".tableRow0").select("td").get(0).text();

    System.out.println(calories);

请有人帮我拉出我的头发!我被打败了:(

编辑

我正在尝试获取包含卡路里的kcal元素!!!

0 个答案:

没有答案