试图修复html表的colspans和rowspans Java

时间:2018-06-12 16:01:22

标签: java html html-table jsoup

我正在尝试使用Java中的rowspans和colspans将HTML表转换为2d数组。 我找到了Java 8流API Extract data from complex HTML tables to 2d array in Java的解决方案。

但我需要一个没有流光的解决方案。我几乎得到了一个解决方案但我有时在某些单元格中无效。

代码:

private static String[][] tableFix2(Elements trElements){
    String table[][] = new String[trElements.first().select("td").size()] 
    [trElements.size()];



    for (int tr = 0; tr < trElements.size(); tr++){
        Elements tdElements = trElements.get(tr).select("td");


        for (int td = 0; td < tdElements.size();td++) {
            Element tdEl = tdElements.get(td);
            String tdElString = tdEl.text();
            //System.out.println(tdElString);

            int colspan = tdEl.attr("colspan").equals("") ? 1 : Integer.parseInt(tdEl.attr("colspan"));
            int rowspan = tdEl.attr("rowspan").equals("") ? 1 : Integer.parseInt(tdEl.attr("rowspan"));

            if (colspan > 1 && rowspan <= 1) {

                for (int c = td; c < td + rowspan; c++) {
                    if (table[c][tr] == null)
                        table[c][tr] = tdElString;
                }


            } else if (rowspan > 1 && colspan <= 1) {

                for (int r = tr; r < tr + rowspan; r++) {
                    if (table[td][r] == null)
                        table[td][r] = tdElString;
                }


            } else if (rowspan > 1 && colspan > 1) {
                for (int r = tr; r < tr + rowspan; r++) {
                    for (int c = td; c < td + colspan; c++) {
                        if (table[c][r] == null)
                            table[c][r] = tdElString;
                    }
                }

            } else {
                if (table[td][tr] == null)
                    table[td][tr] = tdElString;
            }
        }

    }
    System.out.println(Arrays.deepToString(table));


    return table;
}

trElements是这个函数的输入,我使用Jsoup来获取表的所有tr元素

我的表:

enter image description here

输出:

[[A, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
[B, 8:25 - 7:40, 8:30-9:15, 9:20-10:05, 10:25-11:10, 11:15-12:00, 12:05-12:50, 13:10-13:55, 14:00-14:45, 14:50-15:35, 15:40-16:25], 
[C, , classes,classes, , , , , , , ], 
[D, homework,homework, , , , , , , , ], 
[E, , , , , , , , , , ], 
[F, , , , , , , , , , ], 
[G, , , , playing,playing, , , , , ], 
[H, , null, null, sleeping,sleeping, , , , , ]]

html代码:

<table class="c6" dir="rtl">
<tbody>
<tr class="c23" style="height: 31px;">
<td class="c15" style="width: 19px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">A</span></p>
</td>
<td class="c12" style="width: 74px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">B</span></p>
</td>
<td class="c34" style="width: 68px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">C</span></p>
</td>
<td class="c13" style="width: 88px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">D</span></p>
</td>
<td class="c27" style="width: 21px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">E</span></p>
</td>
<td class="c31" style="width: 21px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">F</span></p>
</td>
<td class="c41" style="width: 88px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">G</span></p>
</td>
<td class="c41" style="width: 88px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c26">H</span></p>
</td>
</tr>
<tr class="c16" style="height: 45px;">
<td class="c15" style="width: 19px; height: 45px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">1</span></p>
</td>
<td class="c12" style="width: 74px; height: 45px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">8:25 - 7:40</span></p>
</td>
<td class="c11" style="width: 68px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 91px;" colspan="1" rowspan="2">
<p class="c1 c18" dir="rtl">&nbsp;</p>
homework</td>
<td class="c14" style="width: 21px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 45px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 46px;">
<td class="c15" style="width: 19px; height: 46px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">2</span></p>
</td>
<td class="c12" style="width: 74px; height: 46px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">8:30-9:15</span></p>
</td>
<td class="c11" style="width: 68px; height: 77px;" colspan="1" rowspan="2">
<p class="c3" dir="rtl">&nbsp;</p>
classes&nbsp;</td>
<td class="c14" style="width: 21px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 46px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 31px;">
<td class="c15" style="width: 19px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">3</span></p>
</td>
<td class="c33" style="width: 74px; height: 31px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">9:20-10:05</span></p>
</td>
<td class="c19" style="width: 88px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 31px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c47" style="height: 51px;">
<td class="c36" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">4</span></p>
</td>
<td class="c29" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">10:25-11:10</span></p>
</td>
<td class="c44" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 102px;" colspan="1" rowspan="2">
<p class="c1 c18" dir="rtl">&nbsp;</p>
playing</td>
<td class="c9" style="width: 88px; height: 102px;" colspan="1" rowspan="2">
<p class="c1 c18" dir="rtl">&nbsp;</p>
sleeping</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">5</span></p>
</td>
<td class="c53" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">11:15-12:00</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c39" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">6</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">12:05-12:50</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">7</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">13:10-13:55</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">8</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">14:00-14:45</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">9</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">14:50-15:35</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
<tr class="c16" style="height: 51px;">
<td class="c15" style="width: 19px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c8">10</span></p>
</td>
<td class="c12" style="width: 74px; height: 51px;" colspan="1" rowspan="1">
<p class="c1" dir="rtl"><span class="c21">15:40-16:25</span></p>
</td>
<td class="c11" style="width: 68px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c19" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c14" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c10" style="width: 21px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
<td class="c9" style="width: 88px; height: 51px;" colspan="1" rowspan="1">&nbsp;</td>
</tr>
</tbody>
</table>

这里有什么问题?

1 个答案:

答案 0 :(得分:0)

只计算第3和第4行的tds数量,你就会得到答案。您只是迭代到tdlist.length。您的输入没有足够的列用于某些行。