使用SAX解析器,如何解析具有相同名称标签但在不同元素中的xml文件?

时间:2011-08-26 20:01:40

标签: java xml xpath xml-parsing saxparser

是否可以在SAX解析器中提供路径表达式?我有一个XML文件,它有几个相同的名称标签,但它们在不同的元素中。有没有办法区分它们。 这是XML:

<Schools>
    <School>
        <ID>335823</ID> 
        <Name>Fairfax High School</Name> 
        <Student>
            <ID>4195653</ID>
            <Name>Will Turner</Name>
        </Student>
        <Student>
            <ID>4195654</ID>
            <Name>Bruce Paltrow</Name>
        </Student>
        <Student>
            <ID>4195655</ID>
            <Name>Santosh Gowswami</Name>
        </Student>
    </School>
    <School>
        <ID>335824</ID> 
        <Name>FallsChurch High School</Name> 
        <Student>
            <ID>4153</ID>
            <Name>John Singer</Name>
        </Student>
        <Student>
            <ID>4154</ID>
            <Name>Shane Warne</Name>
        </Student>
        <Student>
            <ID>4155</ID>
            <Name>Eddie Diaz</Name>
        </Student>
    </School>
</Schools>

我想根据学校的名称和ID来区分学生的姓名和身份证明。

感谢您的回复:

我创建了一个学生pojo,其中包含以下字段:school_id,school_name,student_id和student_name以及getter和setter方法。这是我的临时解析器实现。当我解析xml时,我需要将学校名称,id,学生姓名,id的值放在pojo中并返回它。你能告诉我如何实现堆栈以区分。这是我的解析器框架::

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class HandleXML extends DefaultHandler {

    private student info;
    private boolean school_id = false;
    private boolean school_name = false;
    private boolean student_id = false;
    private boolean student_name = false;
    private boolean student = false;
    private boolean school = false;


    public HandleXML(student record) {
        super();
        this.info = record;
        school_id = false;
        school_name = false;
        student_id = false;
        student_name = false;
        student = false;
        school = false;
    }

    @Override
    public void startElement(String uri, String localName,
            String qName, Attributes attributes)
            throws SAXException {
    if (qName.equalsIgnoreCase("student")) {
            student = true;
        }
    if (qName.equalsIgnoreCase("school")) {
            school_id = true;
        }
    if (qName.equalsIgnoreCase("school_id")) {
            school_id = true;
        }
    if (qName.equalsIgnoreCase("student_id")) {
            student_id = true;
        }
    if (qName.equalsIgnoreCase("school_name")) {
            school_name = true;
        }
    if (qName.equalsIgnoreCase("student_name")) {
            student_name = true;
        }
    }

    @Override
    public void endElement(String uri, String localName,
            String qName)
            throws SAXException {
    }

    @Override
    public void characters(char ch[], int start, int length)
            throws SAXException {

        String data = new String(ch, start, length);

    }
}

5 个答案:

答案 0 :(得分:15)

在SAX解析器中,按文档顺序给出每个元素。您必须维护堆栈以跟踪嵌套(在处理startElement时推入堆栈,并为endElement弹出)。您可以通过堆栈中当前的内容区分不同的<Name>元素。

或者,只需保留一个变量,告诉您是否遇到<School>代码或<Student>代码,告诉您正在看哪种类型的<Name>

答案 1 :(得分:13)

好吧,我在Java中没有和SAX玩过多年了,所以这是我对它的看法:

package play.xml.sax;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Stack;

public class Test1 {
    public static void main(String[] args) {
        SAXParserFactory spf = SAXParserFactory.newInstance();
        SchoolsHandler handler = new SchoolsHandler();
        try {
            SAXParser sp = spf.newSAXParser();
            sp.parse("schools.xml", handler);
            System.out.println("Number of read schools: " + handler.getSchools().size());
        } catch (SAXException se) {
            se.printStackTrace();
        } catch (ParserConfigurationException pce) {
            pce.printStackTrace();
        } catch (IOException ie) {
            ie.printStackTrace();
        }
    }
}

class SchoolsHandler extends DefaultHandler {
    private static final String TAG_SCHOOLS = "Schools";
    private static final String TAG_SCHOOL = "School";
    private static final String TAG_STUDENT = "Student";
    private static final String TAG_ID = "ID";
    private static final String TAG_NAME = "Name";

    private final Stack<String> tagsStack = new Stack<String>();
    private final StringBuilder tempVal = new StringBuilder();

    private List<School> schools;
    private School school;
    private Student student;

    public void startElement(String uri, String localName, String qName, Attributes attributes) {
        pushTag(qName);
        tempVal.setLength(0);
        if (TAG_SCHOOLS.equalsIgnoreCase(qName)) {
            schools = new ArrayList<School>();
        } else if (TAG_SCHOOL.equalsIgnoreCase(qName)) {
            school = new School();
        } else if (TAG_STUDENT.equalsIgnoreCase(qName)) {
            student = new Student();
        }
    }

    public void characters(char ch[], int start, int length) {
        tempVal.append(ch, start, length);
    }

    public void endElement(String uri, String localName, String qName) {
        String tag = peekTag();
        if (!qName.equals(tag)) {
            throw new InternalError();
        }

        popTag();
        String parentTag = peekTag();

        if (TAG_ID.equalsIgnoreCase(tag)) {
            int id = Integer.valueOf(tempVal.toString().trim());
            if (TAG_STUDENT.equalsIgnoreCase(parentTag)) {
                student.setId(id);
            } else if (TAG_SCHOOL.equalsIgnoreCase(parentTag)) {
                school.setId(id);
            }
        } else if (TAG_NAME.equalsIgnoreCase(tag)) {
            String name = tempVal.toString().trim();
            if (TAG_STUDENT.equalsIgnoreCase(parentTag)) {
                student.setName(name);
            } else if (TAG_SCHOOL.equalsIgnoreCase(parentTag)) {
                school.setName(name);
            }
        } else if (TAG_STUDENT.equalsIgnoreCase(tag)) {
            school.addStudent(student);
        } else if (TAG_SCHOOL.equalsIgnoreCase(tag)) {
            schools.add(school);
        }
    }

    public void startDocument() {
        pushTag("");
    }

    public List<School> getSchools() {
        return schools;
    }

    private void pushTag(String tag) {
        tagsStack.push(tag);
    }

    private String popTag() {
        return tagsStack.pop();
    }

    private String peekTag() {
        return tagsStack.peek();
    }
}

class School {
    private int id;
    private String name;
    private List<Student> students = new ArrayList<Student>();

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public void addStudent(Student student) {
        students.add(student);
    }

    public List<Student> getStudents() {
        return students;
    }
}

class Student {
    private int id;
    private String name;

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }
}

schools.xml包含您的示例XML。请注意,我把所有内容都塞进了一个文件中,但这只是因为我只是在玩耍。

答案 2 :(得分:2)

是的,使用SAX解析器理解xml通常比使用DOM更复杂。基本上,您需要在SAX解析器中维护状态/上下文,以便区分这些情况。

请注意,实现SAX处理程序的另一个关键是理解可以在多个字符事件中拆分值。

答案 3 :(得分:1)

Sax是基于事件的,通过回调,您可以连续读取XML文档。 Sax非常适合读取大型XML文档,因为整个文档没有加载到内存中。您可能需要查看Xpath,例如

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
String expression = "/Schools/school/ ...";
XPathExpression xPathExpression = xPath.compile(expression);
// Compile the expression to get a XPathExpression object.
Object result = xPathExpression.evaluate(xmlDocument);

答案 4 :(得分:0)

private boolean isInStudentNode;
...................................................    

public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
    // enter node Student
    if(qName.equalEgnoreCase("Student"){
       isInStudentNode = true;
    }
    ...
}

public void endElement(String uri, String localName, String qName) throws SAXException {
    // end node Student
    if(qName.equalEgnoreCase("Student"){
       isInStudentNode = false;
       ...........
    }

    // end node Name (school|student)
    if(qName.equalEgnoreCase("Name"){
        if(isInStudentNode) student.setName(...);
        else school.setName(...);
    }
}

与我合作