解析JSOUP表

时间:2014-05-24 11:05:36

标签: android jsoup

我想解析一个HTML表,但我不明白如何获取值。 我有这张桌子:

<table class="aircraftInfoGrid">
            <tbody>
              <tr class="first">
                <td class="iconContainer">
                  <span class="icon aircraft"></span>
                </td>
                <td colspan="2">
                  Aircraft<span class="right" id="aircraftIcaoVal">(A320)</span><br>
                  <span id="aircraftVal" class="strong">Airbus A320-214</span>
                </td>
              </tr>
              <tr>
                <td class="iconContainer"></td>
                <td colspan="2">
                  Registration<span class="right" id="hexVal">(4CA212)</span><br>
                  <span id="registrationVal" class="strong"><a class="regLink" data-reg="EIDEB">EI-DEB</a></span>
                </td>
              </tr>
              <tr>
                <td class="iconContainer">
                  <span class="icon cloud"></span>
                </td>
                <td>
                  Altitude<br>
                  <span id="altitudeVal" class="strong hasTooltip" data-tooltip-align="left" data-tooltip-value="2,438 m">8,000 ft</span>
                </td>
                <td>
                  Vertical Speed<br>
                  <span id="vspdVal" class="strong">0 fpm</span>
                </td>
              </tr>
              <tr>
                <td class="iconContainer"></td>
                <td>
                  Speed<br>
                  <span id="speedVal" class="strong hasTooltip" data-tooltip-align="left" data-tooltip-value="469 km/h, 291 mph">253 kt</span>
                </td>
                <td>
                  Track<br>
                  <span id="trackVal" class="strong">267°</span>
                </td>
              </tr>
              <tr>
                <td class="iconContainer">
                  <span class="icon satellite"></span>
                </td>
                <td>
                  Latitude<br>
                  <span id="latVal" class="strong">51.593</span>
                </td>
                <td>
                  Longitude<br>
                  <span id="lonVal" class="strong">-0.5887</span>
                </td>
              </tr>
              <tr>
                <td class="iconContainer"></td>
                <td>
                  Radar<br>
                  <span id="radarVal" class="strong">N-EGLM2</span>
                </td>
                <td>
                  Squawk<br>
                  <span id="squawkVal" class="strong">7651</span>
                </td>
              </tr>
            </tbody>
          </table>

我不明白它是如何解析的:这是我的代码:

doc = Jsoup.connect("http://x.com/EIN1C6/367d800").timeout(10*1000).get();
org.jsoup.nodes.Element tabella = doc.getElementsByClass("aircraftInfoGrid").first();

Iterator<org.jsoup.nodes.Element> iterator = tabella.select("td").iterator();
while(iterator.hasNext()){   
    iterator.next().text();
    System.out.println("TITLE: "+iterator.next().text());
}

我得到了这个输出:

    05-24 07:03:18.270: I/System.out(2088): TITLE: Aircraft
05-24 07:03:18.270: I/System.out(2088): TITLE: Registration
05-24 07:03:18.280: I/System.out(2088): TITLE: Altitude
05-24 07:03:18.290: I/System.out(2088): TITLE: 
05-24 07:03:18.290: I/System.out(2088): TITLE: Track
05-24 07:03:18.310: I/System.out(2088): TITLE: Latitude
05-24 07:03:18.310: I/System.out(2088): TITLE: 
05-24 07:03:18.320: I/System.out(2088): TITLE: Squawk
你可以为我做一个例子吗? 我想要解析这个表的所有值...提前谢谢你!

编辑:SPAN VALUE:

    05-24 11:08:49.240: I/System.out(3679): TD Value : Aircraft
05-24 11:08:49.240: I/System.out(3679): TD colspan : 2
05-24 11:08:49.240: I/System.out(3679):     SPAN Value : 
05-24 11:08:49.240: I/System.out(3679):     SPAN class : right
05-24 11:08:49.240: I/System.out(3679):     SPAN id : aircraftIcaoVal
05-24 11:08:49.240: I/System.out(3679):     SPAN Value : 
05-24 11:08:49.240: I/System.out(3679):     SPAN id : aircraftVal
05-24 11:08:49.240: I/System.out(3679):     SPAN class : strong
05-24 11:08:49.240: I/System.out(3679): ************************************

2 个答案:

答案 0 :(得分:0)

如果你想要跨度细节,只需嵌套for循环。例如,以下程序打印出该表的所有值。我只是将gien html复制到一个文件中,以便程序从文件中读取。您可以将其更改为您的网址。

import java.io.File;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Attribute;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;


public class JsoupParser {

    public static void main(String[] args) throws Exception {
        File input = new File("t.html");

        Document doc = Jsoup.parse(input, "UTF-8");

        int index = 0;
        for (Element td : doc.select("td")) {
            System.out.println("************************************");
            System.out.println("Row " + (++index));
            System.out.println("************************************");
            printValuesAndAttributes("TD", td);
            for(Element span : td.select("span")){
                printValuesAndAttributes("\tSPAN", span);
            }
        }
    }

    private static void printValuesAndAttributes(String prefix, Element el) {
        System.out.println(prefix +" Value : " + el.text());
        for(Attribute attr : el.attributes()){
            System.out.println(prefix + " " + attr.getKey() + " : " + attr.getValue());
        }
    }

}

如果你想要整个span标签html然后使用toString方法,或者如果你想使用像上面那样的值作为el.text()

for (Element span : doc.select("td span")) {
    System.out.println(span);
}

答案 1 :(得分:0)

我这样做

Document doc1;
        doc1 = Jsoup.connect("http://localhost:8080/WebApplication1/").get();
        String title = doc1.title();
        //int i = 0;
        //System.out.println("title is: " + title);
        //System.out.println("AAA");
        Elements tr = doc1.select("tr");
        //Elements td = doc1.select("td");

        //System.out.println(td.size());
        for (int i = 0; i < tr.size(); i++) {
            if (!(tr.get(i).child(1).text().contains("Court NOT in session!"))) {
                System.out.println("Court Number"+tr.get(i).child(0).text());
                System.out.println("Sl Number"+tr.get(i).child(1).text());
                System.out.println("List Type "+tr.get(i).child(2).text());
            }
            //System.out.println(i);
        }

要选择qith特定属性的元素,请参阅以下链接中的jsoup选择器

https://jsoup.org/apidocs/org/jsoup/select/Selector.html