Question

大家好，这是我的第一个问题，我不是程序员。

我想生成一个站点地图。我正在使用webcrawler（crawler.dev.java.net）抓取一个网站。有没有办法为我获得的数据使用sax解析器？

我也使用了jtidy，我得到了在xml文件中转换的主页html数据。

我非常困惑有很多萨克斯解析器，idont知道它们之间的区别以及选择哪一个。

我想访问html标签的属性，我无法使用webcrawler或我不知道该怎么做

org.xml.sax和所有其他软件包之间的区别是什么？

Answer 1

Java提供了一种通过JAXP与SAX解析器交互的标准方法（参见下面的代码）。要在SAX解析器之间切换，通常只需要将解析器jar添加到类路径中，代码保持不变。

您可以按如下方式进行sax解析：

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class Demo {

    public static void main(String[] args) throws Exception {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        XMLReader xmlReader = sp.getXMLReader();
        xmlReader.setContentHandler(new MyContentHandler());
        xmlReader.parse(input);

    }

    private static class MyContentHandler implements ContentHandler {

        public void setDocumentLocator(Locator locator) {
        }

        public void startDocument() throws SAXException {
        }

        public void endDocument() throws SAXException {
        }

        public void startPrefixMapping(String prefix, String uri)
                throws SAXException {
        }

        public void endPrefixMapping(String prefix) throws SAXException {
        }

        public void startElement(String uri, String localName, String qName,
                Attributes atts) throws SAXException {
        }

        public void endElement(String uri, String localName, String qName)
                throws SAXException {
        }

        public void characters(char[] ch, int start, int length)
                throws SAXException {
        }

        public void ignorableWhitespace(char[] ch, int start, int length)
                throws SAXException {
        }

        public void processingInstruction(String target, String data)
                throws SAXException {
        }

        public void skippedEntity(String name) throws SAXException {
        }

    }

}

使用sax和webcrawler的站点地图

1 个答案: