如何使用jsoup解析此标记

时间:2016-07-01 08:19:21

标签: java jsoup

<p class="name">
<a href="/shop/view.php?index_no=22176&amp;cate="><strong class="title displaynone"> :</strong>T-shritsT</a> <span class="icon"></span></p>
<ul class="xans-element- xans-product xans-product-listitem">
<li class=" xans-record-"><strong class="title displaynone"><span style="font-size:12px;color:#555555;">price</span> :</strong> <span style="font-size:12px;color:#555555;"><s></s>$20</span></li>

在这段代码中,我想只获取文字“T-shrits”,价格为“$ 20”,不含“:”和“价格”

这是我的代码,

    Elements goods = document.select("p.name > a"); 
         for (Element e :goods) {
         System.out.println("------------------------------------------");
         System.out.println("goods" + e.text()); } 

1 个答案:

答案 0 :(得分:0)

试试这个:

public class Test {
    public static void main(String[] args) {
        String s="<p class=\"name\">\n" +
                "<a href=\"/shop/view.php?index_no=22176&amp;cate=\"><strong class=\"title displaynone\"> :</strong>T-shritsT</a> <span class=\"icon\"></span></p>\n" +
                "<ul class=\"xans-element- xans-product xans-product-listitem\">\n" +
                "<li class=\" xans-record-\"><strong class=\"title displaynone\"><span style=\"font-size:12px;color:#555555;\">price</span> :</strong> <span style=\"font-size:12px;color:#555555;\"><s></s>$20</span></li>";
        Document document= Jsoup.parse(s);
        document.select("strong").remove();
        Whitelist whitelist = Whitelist.basic();
        System.out.println(Jsoup.parse(Jsoup.clean(document.toString(), whitelist)).text());

    }


}

输出: T-shritsT $20