Question

我使用htmlparser来解析库中书籍的网页数据。代码snapet如下

NodeList nodes=parser.extractAllNodesThatMatch(filter);
for(int i = 0;i < nodes.size();i++)
{
    Bookinfo cur_bki=new Bookinfo();
    Parser getboolinfo=new Parser(nodes.elementAt(i).toHtml());
    String doc_type=getboolinfo.extractAllNodesThatMatch(new NodeFilter() 
                {
                    @Override
                    public boolean accept(Node node)
                    {
                        return ((node instanceof Tag)
                                && !((Tag)node).isEndTag()
                                && ((Tag)node).getTagName().equals("SPAN")
                                && ((Tag)node).getAttribute("class") != null
                                && ((Tag)node).getAttribute("class").equals());
                    }
                }).elementAt(0).toPlainTextString();
    message("extract doc_type"+doc_type);
    cur_bki.set_doc_type(doc_type);
    getboolinfo=new Parser(nodes.elementAt(i).toHtml());

    TagNameFilter filtertemp=new TagNameFilter("a");
    NodeList tempp = getboolinfo.extractAllNodesThatMatch(filtertemp);
    String title=getboolinfo.extractAllNodesThatMatch(new NodeFilter() 
                {
                    @Override
                    public boolean accept(Node node)
                    {
                        return ((node instanceof Tag)
                                && !((Tag)node).isEndTag()
                                && ((Tag)node).getTagName().equals("A"));
                    }
                }).elementAt(0).toPlainTextString();
    message("extract title"+title);
}

如果我删除

getboolinfo=new Parser(nodes.elementAt(i).toHtml());

然后标题将为空这意味着我只能使用extractAllnodesthatmatch函数一次但为什么，你能帮助我吗？

为什么htmlparser的extractAllNodesThatMatch函数只能使用一次？

0 个答案: