Question

我正在尝试使用JSoup从体育参考表中检索球队的获胜金额。

Specifically, I am trying to receive the following data point highlighted below, with the html code provided

下面是我已经尝试过的内容，但是在尝试访问该元素的文本时出现空指针异常，这告诉我我的代码可能无法正确解析HTML代码。

__init__

我想要的是该元素的文字为34（或一些数字，具体取决于团队的获胜次数）。

Answer 1

Check what your Document was able to read from page and print it。如果其中包含HTML内容（可通过浏览器由JavaScript动态添加），则需要使用Selenium而非Jsoup作为工具。

For reading HTML source，您可以编写类似于：

import java.io.IOException;
import org.jsoup.Jsoup;

public class JSoupHTMLSourceEx {
    public static void main(String[] args) throws IOException {
        String webPage = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
        String html = Jsoup.connect(webPage).get().html();
        System.out.println(html);
    }
}

由于Jsoup支持cssSelector，因此您可以尝试获取类似以下内容的元素：

public static void main(String[] args)  {
        String webPage = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
        String html = Jsoup.connect(webPage).get().html();

Document document = Jsoup.parse(html);
    Elements tds = document.select("#team_misc > tbody > tr:nth-child(1) > td:nth-child(2)");
        for (Element e : tds) {
            System.out.println(e.text());
        }
}

但是更好的解决方案是使用Selenium-一种用于测试Web应用程序（more details about Selenium tool）的便携式框架：

public static void main(String[] args) {
    String baseUrl = "https://www.basketball-reference.com/teams/CHI/2020.html#all_team_misc";
    WebDriver driver = new FirefoxDriver();

    driver.get(baseUrl);
    String innerText = driver.findElement(
        By.xpath("//*[@id="team_misc"]/tbody/tr[1]/td[1]")).getText();  
        System.out.println(innerText); 
    driver.quit();
    }
}

您也可以尝试代替：

driver.findElement(By.xpath("//*[@id="team_misc"]/tbody/tr[1]/td[1]")).getText();

in this form：

driver.findElement(By.xpath("//[@id="team_misc"]/tbody/tr[1]/td[1]")).getAttribute("innerHTML");

P.S。将来，从想要获取信息或至少要获取DOM结构的片段（而不是图像）的位置添加源链接会很有用。

如何使用JSoup从Sports Reference的数据表中检索数据？

1 个答案: