拆分标签上的元素

时间:2015-06-24 16:35:17

标签: java html jsoup

如果我有一个看起来像这样的元素

<li> this is before <span class="between"> this is between </span> this is after </li>

如何使用JSoup获取数组{"this is before", "this is after"}

注意:文本可能包含多个span个,但只包含一个between个类。例如,

<li> 
this 
<span class="other"> is </span> 
before 
<span class="between"> this is between </span> 
this is 
<span class="other"> after </span> 
</li>

也应该产生{"this is before", "this is after"}

1 个答案:

答案 0 :(得分:1)

您可以迭代li的子节点:

final String html = "<li> \n"
        + "this \n"
        + "<span class=\"other\"> is </span> \n"
        + "before \n"
        + "<span class=\"between\"> this is between </span> \n"
        + "this is \n"
        + "<span class=\"other\"> after </span> \n"
        + "</li>";

Document doc = Jsoup.parse(html);
Element li = doc.select("li").first();
List<String> text = new ArrayList<>();
StringBuilder sb = new StringBuilder();

for( Node node : li.childNodes() ) // Iterate over childnodes
{
    if( node instanceof TextNode ) // Plain text
    {
        sb.append(node.toString());
    }
    else if( node instanceof Element ) // Element
    {
        final Element element = (Element) node;

        if( element.tagName().equals("span") // Span with 'between' class
                && element.attr("class").equals("between") == true )
        {
            text.add(sb.toString().trim());
            sb = new StringBuilder();
        }
        else // Every other element
        {
            sb.append(element.ownText());
        }
    }
}

text.add(sb.toString().trim());

System.out.println(text);

<强>输出:

[this is before, this is after]