如何使用jsoup提取文本值

时间:2017-08-23 20:32:33

标签: jsoup selectors-api

我正在尝试提取CheckBoxIsChecked =" t"

之后的文本值
p  > w|Sdt[CheckBoxIsChecked$='t']

但似乎jsoup忽略了它,我不知道如何阅读此后的文字 我可以使用java来做,但我试图使它通用 有这样的东西:

p  > w|Sdt[CheckBoxIsChecked$='t']  > first text after...
在这个例子中,所需的值是:
 我需要此值,因为CheckBoxIsChecked为真

<p class="MsoNormal" style="margin-bottom:0in;margin-bottom:.0001pt;line-height:normal">
<w:Sdt CheckBox="t" CheckBoxIsChecked="t" >
    <span style="font-family:&quot;MS Gothic&quot;">y</span>
</w:Sdt> I Need this value since CheckBoxIsChecked is true 
<w:Sdt CheckBox="t" CheckBoxIsChecked="f" >
    <span style="font-family:&quot;MS Gothic&quot;">n</span>
</w:Sdt> This is not needed since CheckBoxIsChecked is false 
<w:Sdt CheckBox="t" CheckBoxIsChecked="f">
    <span style="font-family:&quot;MS Gothic&quot;">n</span>
</w:Sdt> This is not needed since CheckBoxIsChecked is false<o:p/>

link to the sample

1 个答案:

答案 0 :(得分:1)

您可以使用Element.ownText()方法提取特定标记旁边的文字。您可以在下面找到基于您的示例创建的示例:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Example {

    public static void main(String[] args) {
        String html = "<p class=\"MsoNormal\" style=\"margin-bottom:0in;margin-bottom:.0001pt;line-height:normal\">\n" +
                "<w:Sdt CheckBox=\"t\" CheckBoxIsChecked=\"t\" >\n" +
                "    <span style=\"font-family:&quot;MS Gothic&quot;\">y</span>\n" +
                "</w:Sdt> I Need this value since CheckBoxIsChecked is true \n" +
                "<w:Sdt CheckBox=\"t\" CheckBoxIsChecked=\"f\" >\n" +
                "    <span style=\"font-family:&quot;MS Gothic&quot;\">n</span>\n" +
                "</w:Sdt> This is not needed since CheckBoxIsChecked is false \n" +
                "<w:Sdt CheckBox=\"t\" CheckBoxIsChecked=\"f\">\n" +
                "    <span style=\"font-family:&quot;MS Gothic&quot;\">n</span>\n" +
                "</w:Sdt> This is not needed since CheckBoxIsChecked is false<o:p/>";

        Document doc = Jsoup.parse(html);

        doc.select("p > w|sdt[checkboxischecked=t]").forEach(it -> {
            String text = it.ownText();
            System.out.println(text);
        });

    }
}

您可以在此处运行 Demo