Question

我想从<p>I want only this line</p>标签打印一行，并想忽略所有其他行。

我有以下html：

<div class="my value"> 
<h2>Head2</h2>

<p>&nbsp;</p>

<p><strong></strong>Date</p>

<p></p>

<h2><u>Head2</u></h2>

<p>&nbsp;</p>

<p>I want only this line</p>

<p>&nbsp;</p>

<p><strong><u></u></strong></p>

<p>&nbsp;</p>

<p>I do not want this line</p>

</div>

我的java代码是：

String html = "link of the website that contains my html I have showed on top";
Document doc;
try {
    doc = Jsoup.connect(html).get();

    Elements link = doc.select("div.my.value");
    doc=Jsoup.parse(link.html());
    link =doc.select("p");
    String linkText = link.text();

    System.out.println("Link Text\n" + linkText);

} catch (IOException ex) {
    System.out.println("err: " + ex);
}

输出是：

我只想要这一行，我不想要这一行

但我想只打印此行我只想要这一行，并希望忽略所有其他<p> </p>标记。我怎样才能做到这一点？

Answer 1

获得所需内容的关键是创建一个好的选择器。让我们看一些使用HTML的例子：

1）按内容选择： p:contains(I want only this line)或者，如果您想更具体一点，div.my p:contains(I want only this line)

2）按DOM中的位置选择：div p:eq(6)

为了获得元素，我更喜欢使用这个语句： Jsoup.parse(html).select("div.my p:contains(I want only this line)").first()

然后你只需要检查返回的元素是否为空。否则，您可以获得NullPointException。

从jsoup中的许多</p> <p>标记中解析单个<p>标记

1 个答案: