如何使用java中的html解析器创建一个div内容的过滤器

时间:2012-05-09 10:59:10

标签: java html filter html-parsing

我正在尝试使用htmlparser库解析HTML字符串。 html是这样的:

<body>
        <div class="Level1">
            <div class="row">
                <div class="txt">
                    Date of analysis:
                </div><div class="content">
                    02/03/11
                </div>
            </div>
        </div><div class="Level1">
            <div class="row">
                <div class="txt">
                    Site:
                </div><div class="content">
                    13.0E
                </div>
            </div>
        </div><div class="Level1">
            <div class="row">
                <div class="txt">
                    Network type:
                </div><div class="content">
                    DVB-S
                </div>
            </div>
        </div>
</body>

我需要提取给定“txt”的“内容”信息。我做了一个过滤器,返回带有class =“level1”的div,但我不知道如何使用div的内容制作过滤器,我的意思是如果txt的值是Site:那么读取内容如13.0即

  NodeList nl = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("div"), new HasAttributeFilter("class", "Level1")));
       

有人可以帮我解决这个问题吗?如何在div中读取div? 谢谢!

1 个答案:

答案 0 :(得分:0)

NodeList nl = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("div"), new HasAttributeFilter("class", "Level1")));

最好这样做:

NodeList nl = parser.parse(null); // you can also filter here

NodeList divs = nl.extractAllNodesThatMatch(
  new AndFilter(new TagNameFilter("DIV"), 
    new HasAttributeFilter("class", "txt")));

if( divs.size() > 0 ) {
  Tag div = divs.elementAt(0);
  String text = div.getText(); // this is the text of the div
}