Question

我正在尝试使用JSoup从表中提取财务信息。我已经回顾了类似的问题，可以让他们的例子起作用（这里有两个：

Using JSoup To Extract HTML Table Contents）。

我不确定为什么代码不适用于my URL。

以下是3种不同的尝试。任何帮助将不胜感激。

String s = "http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=usa&culture=en-US";

//Attempt 1
try {
    Document doc = Jsoup.connect("http://financials.morningstar.com/valuation/price-ratio.html?t=AXP&region=USA&culture=en_US").get();

    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            if (tds.size() > 6) {
                System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
            }
        }
    }
} 
catch (IOException ex) {
    ex.printStackTrace();
}

// Attempt 2
try {
    Document doc = Jsoup.connect(s).get(); 
    for (Element table : doc.select("table#currentValuationTable.r_table1.text2")) {
        for (Element row : table.select("tr")) {
            Elements tds = row.select("td");
            for (int i = 0; i < tds.size(); i++) {
                System.out.println(tds.get(i).text());
            }
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

//Attempt 3
try {
    Document doc = Jsoup.connect(s).get(); 
    Elements tableElements = doc.select("table#currentValuationTable.r_table1.text2");

    Elements tableRowElements = tableElements.select(":not(thead) tr");

    for (int i = 0; i < tableRowElements.size(); i++) {
        Element row = tableRowElements.get(i);
        System.out.println("row");
        Elements rowItems = row.select("td");
        for (int j = 0; j < rowItems.size(); j++) {
            System.out.println(rowItems.get(j).text());
        }
    }        
} 
catch (IOException ex) {
    ex.printStackTrace();
}

Answer 1

Psherno提供的答案：

打印文档能够从页面读取的内容（使用System.out.println(doc);）。有些事情告诉我你的问题可能与你正在寻找的HTML内容是由浏览器通过JavaScript动态添加的事实有关，Jsoup不能做，因为它没有JavaScript支持。在这种情况下，您应该使用更强大的工具，如Web驱动程序（如Selenium）。

使用JSoup提取表数据

1 个答案: