SAX解析:在文本元素中遇到混合内容

时间:2013-02-04 01:52:04

标签: java xml-parsing sax

我正在尝试解析一个类似于以下内容的XML文件(代表电视指南)......

<?xml version="1.0" encoding="utf-8"?>
<channels>
  <channel>
    <name>BBC ONE</name>
    <oid>10029</oid>
      ...
    <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:40:00</end_time>
        <episode>9</episode>
        <genres>Entertainment</genres>
        <oid>10583734</oid>
        <season>8</season>
        <start_time>2013-02-04 00:15:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
      <programme>
        ..
      </programme>
    </programmes>
  </channel>
  <channel>
    ...
  </channel>
</channels>

我正在使用两个解析器 - 一个用于通道,另一个用于程序,但显然这意味着我需要检索整个<programmes>...</programmes>以将其传递给“程序”解析器。

我在'频道'解析器中尝试了以下内容......

public List<XMLTVChannel> parse() {
    RootElement rootElement = new RootElement("channels");
    final List<XMLTVChannel> channelsList = new ArrayList<XMLTVChannel>();
    Element channelElement = rootElement.getChild("channel");

    ...

    // Set the EndTextElementListeners for the <channel> child elements
    channelElement.getChild(CHANNEL_OID).setEndTextElementListener(new EndTextElementListener() {
        public void end(String body) {
            currentChannel.setOid(body);
        }
    });

    ...

    // HERE'S THE PROBLEM
    channelElement.getChild("programmes").setEndTextElementListener(new EndTextElementListener() {
        public void end(String body) {
            // NEED TO INVOKE XMLTVProgrammeParser HERE
        }
    });
    try {
        Xml.parse(getInputStream(), Xml.Encoding.UTF_8, rootElement.getContentHandler());
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
    return channelsList;
}

好的,所以我用谷歌搜索,我确切地知道问题是什么 - 传递到String body方法的end(...)参数应该只包含文本,而它是元素和文本的混合。 / p>

我已经阅读了一些类似的stackoverflow问题和文章,这些问题和文章表明我需要定义自己的ContentHandler,但我没有发现任何类似于我正在尝试做的事情。自定义ContentHandler是我唯一的选择还是有其他方式?

1 个答案:

答案 0 :(得分:3)

你的意思是你想要这个输出:

 BBC ONE
10029
------------------------
The Celebrity Apprentice USA
2013-02-04 00:15:00 - 2013-02-04 01:40:00
Entertainment
Season : 8 / Episode : 9
Description:
Blah blah blah
10583734
**********************
The Celebrity Apprentice USA
2013-02-04 01:45:00 - 2013-02-04 02:25:00
Entertainment
Season : 8 / Episode : 10
Description:
Blah blah blah
10583735
**********************
//////////////////////////
BBC TWO
10030
------------------------
American Dad
2013-02-04 00:30:00 - 2013-02-04 01:25:00
Cartoon
Season : 14 / Episode : 1
Description:
Blah blah blah
10583734
**********************
American Dad
2013-02-04 01:30:00 - 2013-02-04 02:15:00
Cartoon
Season : 14 / Episode : 2
Description:
Blah blah blah
10583735
**********************
//////////////////////////

我已经修改了你的xml文件:

    <?xml version="1.0" encoding="utf-8"?>
<channels>
  <channel>
    <name>BBC ONE</name>
    <oid>10029</oid>
    <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:40:00</end_time>
        <episode>9</episode>
        <genres>Entertainment</genres>
        <oid>10583734</oid>
        <season>8</season>
        <start_time>2013-02-04 00:15:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
       <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 02:25:00</end_time>
        <episode>10</episode>
        <genres>Entertainment</genres>
        <oid>10583735</oid>
        <season>8</season>
        <start_time>2013-02-04 01:45:00</start_time>
        <title>The Celebrity Apprentice USA</title>
      </programme>
    </programmes>
  </channel>
  <channel>
      <name>BBC TWO</name>
      <oid>10030</oid>
      <programmes>
      <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 01:25:00</end_time>
        <episode>1</episode>
        <genres>Cartoon</genres>
        <oid>10583734</oid>
        <season>14</season>
        <start_time>2013-02-04 00:30:00</start_time>
        <title>American Dad</title>
      </programme>
       <programme>
        <description>Blah blah blah</description>
        <end_time>2013-02-04 02:15:00</end_time>
        <episode>2</episode>
        <genres>Cartoon</genres>
        <oid>10583735</oid>
        <season>14</season>
        <start_time>2013-02-04 01:30:00</start_time>
        <title>American Dad</title>
      </programme>
    </programmes>
  </channel>
</channels>

Java类:

频道

public class Channel {

        private String name;
        private String oid;
        private ArrayList<Programme> alProgrammes;

        public Channel(){}

        public String getName() {
            return name;
        }

        public void setName(String name) {
            this.name = name;
        }

        public String getOid() {
            return oid;
        }

        public void setOid(String oid) {
            this.oid = oid;
        }

        public ArrayList<Programme> getAlProgrammes() {
            return alProgrammes;
        }

        public void setAlProgrammes(ArrayList<Programme> alProgrammes) {
            this.alProgrammes = alProgrammes;
        }


    }

计划

 public class Programme {

    private String description;
    private String end_time;
    private String episode;
    private String genres;
    private String oid;
    private String season;
    private String start_time;
    private String title;



    public Programme() {
    }

    //Getters / Setters
    public String getDescription() {
        return description;
    }
    public void setDescription(String description) {
        this.description = description;
    }
    public String getEnd_time() {
        return end_time;
    }
    public void setEnd_time(String end_time) {
        this.end_time = end_time;
    }
    public String getEpisode() {
        return episode;
    }
    public void setEpisode(String episode) {
        this.episode = episode;
    }
    public String getGenres() {
        return genres;
    }
    public void setGenres(String genres) {
        this.genres = genres;
    }
    public String getOid() {
        return oid;
    }
    public void setOid(String oid) {
        this.oid = oid;
    }
    public String getSeason() {
        return season;
    }
    public void setSeason(String season) {
        this.season = season;
    }
    public String getStart_time() {
        return start_time;
    }
    public void setStart_time(String start_time) {
        this.start_time = start_time;
    }
    public String getTitle() {
        return title;
    }
    public void setTitle(String title) {
        this.title = title;
    }

}

XMLManager

public final class XMLManager {

    public static ArrayList<Channel> getAlChannels(){

          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder db = null;
          Document doc = null;
          ArrayList<Channel> alChannels = new ArrayList<>();

          try {

            db = dbf.newDocumentBuilder();
            doc = db.parse(new File("D:\\Loic_Workspace\\Test2\\res\\test.xml"));
            NodeList ndListChannels = doc.getElementsByTagName("channel");

            Integer channelsCount = ndListChannels.getLength();
            NodeList ndListChannel = null;
            Integer ndListChannelLength = null;
            Channel channel = null;
            NodeList ndListProgrammes = null;
            for(int i=0;i<channelsCount;i++){

                ndListChannel = ndListChannels.item(i).getChildNodes();
                ndListChannelLength = ndListChannel.getLength();
                channel = new Channel();
                for(int j=0;j<ndListChannelLength;j++){

                    Node currentNode = ndListChannel.item(j);
                    String currentNodeName = currentNode.getNodeName();
                    String value = currentNode.getTextContent();

                    if(currentNodeName.equals("name")){
                        channel.setName(value);
                    }
                    if(currentNodeName.equals("oid")){
                        channel.setOid(value);
                    }
                    if(currentNodeName.equals("programmes")){
                        ndListProgrammes = currentNode.getChildNodes();
                        ArrayList<Programme> alProgrammes = new ArrayList<>();
                        for(int k=0;k<ndListProgrammes.getLength();k++){

                            Node ndProgrammes = ndListProgrammes.item(k);
                            if(ndProgrammes.hasChildNodes()){

                                NodeList ndListProgramme = ndProgrammes.getChildNodes();
                                Integer ndListProgrammeLength = ndListProgramme.getLength();
                                Programme programme = new Programme();
                                for(int l=0;l<ndListProgrammeLength;l++){

                                    Node  ndProgramme = ndListProgramme.item(l);
                                    String nodeProgrameName = ndProgramme.getNodeName();
                                    String nodeProgrameValue = ndProgramme.getTextContent();
                                    if(nodeProgrameName.equals("description")){
                                        programme.setDescription(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("end_time")){

                                        programme.setEnd_time(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("episode")){
                                        programme.setEpisode(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("genres")){
                                        programme.setGenres(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("oid")){
                                        programme.setOid(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("season")){
                                        programme.setSeason(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("start_time")){
                                        programme.setStart_time(nodeProgrameValue);
                                    }
                                    if(nodeProgrameName.equals("title")){
                                        programme.setTitle(nodeProgrameValue);
                                    }

                                }

                                alProgrammes.add(programme);

                            }

                        }

                        channel.setAlProgrammes(alProgrammes);

                    }

                }

                alChannels.add(channel);

            }



          } catch (ParserConfigurationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

          return alChannels;

    }



}

public class MyMain {

    /**
     * @param args
     */
    public static void main(String[] args) {


        ArrayList<Channel> alChannels = XMLManager.getAlChannels();
        for(Channel c:alChannels){
            System.out.println(c.getName());
            System.out.println(c.getOid());
            System.out.println("------------------------");
            for(Programme p:c.getAlProgrammes()){
                System.out.println(p.getTitle());
                System.out.println(p.getStart_time()+" - "+p.getEnd_time());
                System.out.println(p.getGenres());
                System.out.println("Season : "+p.getSeason()+" / Episode : "+p.getEpisode());
                System.out.println("Description:\n"+p.getDescription());
                System.out.println(p.getOid());
                System.out.println("**********************");
            }

            System.out.println("//////////////////////////");

        }

    }

}

更新

以下是我如何使用SAX进行此操作的示例。

重要提示:我保留了课程计划和频道

ChannelsHandler

public class ChannelsHandler extends DefaultHandler{

    private ArrayList<Channel> tvGuide;
    private Channel channel;
    private ArrayList<Programme> alProgrammes;
    private Programme programme;
    private String reading;

    public ChannelsHandler(){
        super();
    }

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {

        if(qName.equals("channels")){
            tvGuide = new ArrayList<>();
        }else if(qName.equals("channel")){
            channel = new Channel();
        }
        else if(qName.equals("channel")){
            channel = new Channel();
        }
        else if(qName.equals("programmes")){
            alProgrammes = new ArrayList<>();
        }
        else if(qName.equals("programme")){
            programme = new Programme();
        }

    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        reading = new String(ch, start, length);
    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {

        if(qName.equals("channel")){
            tvGuide.add(channel);
            channel = null;
        }
        if(qName.equals("name")){
            channel.setName(reading);
        }
        else if(qName.equals("programmes")){
            channel.setAlProgrammes(alProgrammes);
            alProgrammes = new ArrayList<>();
        }
        else if(qName.equals("programme")){
            alProgrammes.add(programme);
            programme = null;
        }
        else if(qName.equals("description")){
            programme.setDescription(reading);
        }
        else if(qName.equals("end_time")){
            programme.setEnd_time(reading);
        }
        else if(qName.equals("episode")){
            programme.setEpisode(reading);
        }
        else if(qName.equals("genres")){
            programme.setGenres(reading);
        }
        else if(qName.equals("season")){
            programme.setSeason(reading);
        }
        else if(qName.equals("start_time")){
            programme.setStart_time(reading);
        }
        else if(qName.equals("title")){
            programme.setTitle(reading);
        }

    }

    public ArrayList<Channel> getTVGuide(){
        return tvGuide;
    }



}

我的新主

public static void main(String[] args) {

        SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            SAXParser parser = factory.newSAXParser();
            File file = new File("D:\\Loic_Workspace\\TestSAX\\res\\test.xml");
            ChannelsHandler handler = new ChannelsHandler();
            parser.parse(file,handler);
            List<Channel> tvGuide = handler.getTVGuide();
            for(Channel c:tvGuide){
                System.out.println(c.getName());
                System.out.println("------------------------");
                for(Programme p:c.getAlProgrammes()){
                    System.out.println(p.getTitle());
                    System.out.println(p.getStart_time()+" - "+p.getEnd_time());
                    System.out.println(p.getGenres());
                    System.out.println("Season : "+p.getSeason()+" / Episode : "+p.getEpisode());
                    System.out.println("Description:\n"+p.getDescription());
                    System.out.println("**********************");
                }

                System.out.println("//////////////////////////");

            }
        } catch (ParserConfigurationException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

我的控制台输出:

BBC ONE
------------------------
The Celebrity Apprentice USA
2013-02-04 00:15:00 - 2013-02-04 01:40:00
Entertainment
Season : 8 / Episode : 9
Description:
Blah blah blah
**********************
The Celebrity Apprentice USA
2013-02-04 01:45:00 - 2013-02-04 02:25:00
Entertainment
Season : 8 / Episode : 10
Description:
Blah blah blah
**********************
//////////////////////////
BBC TWO
------------------------
American Dad
2013-02-04 00:30:00 - 2013-02-04 01:25:00
Cartoon
Season : 14 / Episode : 1
Description:
Blah blah blah
**********************
American Dad
2013-02-04 01:30:00 - 2013-02-04 02:15:00
Cartoon
Season : 14 / Episode : 2
Description:
Blah blah blah
**********************
//////////////////////////

这是我第一次使用SAX。也许你可以找到更有效的东西,但它的工作:-) 我没有在我的更新中管理程序或频道的重复OID标记。