Question

我正在尝试访问Android中HTML文件中的元素。我使用volley（stringRequest）检索了文档，现在正尝试使用JSOUP解析文档。

HTML文档中包含一些代码，如下所示：

<div class="theProducts"> 
    <h3>
        <a href="http://www.myproduct.com/myproduct.html" >
            This is the product information I want to access
            <img src="http://prettypictures.myproduct.com/myproduct.jpg" alt=""  />
        </a>
    </h3>
</div>

我可以通过执行以下操作来访问文档中包含的“产品”：

    Document doc = Jsoup.parse(response);

    String title = doc.title();
    Elements productElements = doc.getElementsByClass("theProducts");

    for (Element productElement : productElements) {
        //String name = productElement.attr("name");
        //String content = productElement.attr("content");
    }

所以，我确实非常高兴地收到了一系列的productElements。但我不知道如何访问我想要的特定元素（即'这是我想要访问的产品'）。我可以看到它嵌套在数组中，但它是深层嵌套的。

是否有人能够向我解释使用的正确语法。我对DOM模型并不熟悉，因此我感到有些困惑。我确实尝试了doc.getElementsByClass（theProducts.h3）和（theProducts＃h3），但这些都没有起作用，而是我得到了0个结果。

我也尝试访问outerHtml但是这会返回整个<h3>部分。

非常感谢任何帮助。

Answer 1

获取所需元素的简便方法是

Elements els = doc.select("div.theProducts>h3>a");
for(Element el : els) {
    System.out.println(el.text());
}

此处第一行doc.select("div.theProducts>h3>a")将为所有div标记提供类theProducts并将h3和child以及anchor作为h3元素的子元素。

编辑::有关选择器标记的详细信息

阅读this link

Answer 2

更多搜索，我在这里找到答案：

Parse the inner html tags using jSoup

我现在就去投票吧！

在我的问题（在该页面上找到）的上下文中发布答案......

Elements headlinesCat1 = doc.getElementsByTag("h3");
for (Element headline : headlinesCat1) {
    Elements importantLinks = headline.getElementsByTag("a");
    for (Element link : importantLinks) {
        String linkHref = link.attr("href");
        String linkText = link.text(); //THIS IS THE TEXT I WANTED...
        System.out.println(linkHref);
    }
}

JSOUP - 访问div类中的特定元素

2 个答案: