Question

我的任务是读取一个html文件并过滤掉所有html标签，并仅将值（即整数）存储在一个数组中。我只允许使用扫描仪及其方法。我一直在谷歌搜索并找到关于使用.replace或一些现有功能的建议，这些功能可以轻松删除所有html标签，但不幸的是我不允许使用它。

这是我到目前为止所做的。每个表格以“00-01”，“01-02”，“02-03”等行开头。我使用它作为分隔符，下面的代码只打印2个表之间的html代码。

public static void getEightDays(int[][] data) throws Exception {
    URL url = new URL(nordpoolURL);
    Scanner scan = new Scanner(new InputStreamReader(url.openStream()));

    while (scan.findInLine("00-01") == null) {
        scan.nextLine();
    }
    while (scan.findInLine("01-02") == null) {
        System.out.println(scan.nextLine());
    }

}

它给了我以下内容;

</td>
<td align="right"> 11872</td>
<td align="right"> 12146</td>
<td align="right"> 12861</td>
<td align="right"> 12561</td>
<td align="right"> 13493</td>
<td align="right"> 13386</td>
<td align="right"> 12732</td>
<td align="right"> <b>12249</b></td>
</tr>
<tr bgcolor="#ffffff">

Here's the full html-code of the website I'm trying to read.

总而言之，我的问题是我无法通过仅使用Scanner方法找到摆脱所有html标签的方法。另外，按照我的方式，我必须有24个while循环，因为有24行数据，这似乎效率低下，并且可能有一种更简单的方法。

请指点我正确的方向帮助我！谢谢。

使用Java在数组中存储html表。

0 个答案: