用Java读取多级XML文件

时间:2012-06-29 20:49:04

标签: java xml

直到最近,我的XML文件的标记结构相当简单。但现在我有一个带标签的额外标签级别,解析XML变得更加复杂。

以下是我的新XML文件的示例(我更改了标记名称以便于理解):

<SchoolRoster>
   <Student>
      <name>John</name>
      <age>14</age>
      <course>
         <math>A</math>
         <english>B</english>
      </course>
      <course>
         <government>A+</government>
      </course>
   </Student>
   <Student>
      <name>Tom</name>
      <age>13</age>
      <course>
         <gym>A</gym>
         <geography>incomplete</geography>
      </course>
   </Student>
</SchoolRoster>

上面的XML的重要特征是我可以有多个“课程”属性,在里面我可以任意命名标签作为他们的孩子。并且可以有任意数量的这些孩子,我想读入“名称”,“价值”的HashMap。

public static TreeMap getAllSchoolRosterInformation(String fileName) {
    TreeMap SchoolRoster = new TreeMap();

    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        File file = new File(fileName);
        if (file.exists()) {
            Document doc = db.parse(file);
            Element docEle = doc.getDocumentElement();
            NodeList studentList = docEle.getElementsByTagName("Student");

            if (studentList != null && studentList.getLength() > 0) {
                for (int i = 0; i < studentList.getLength(); i++) {

                    Student aStudent = new Student();
                    Node node = studentList.item(i);

                    if (node.getNodeType() == Node.ELEMENT_NODE) {
                        Element e = (Element) node;
                        NodeList nodeList = e.getElementsByTagName("name");
                        aStudent.setName(nodeList.item(0).getChildNodes().item(0).getNodeValue());

                        nodeList = e.getElementsByTagName("age");
                        aStudent.setAge(Integer.parseInt(nodeList.item(0).getChildNodes().item(0).getNodeValue()));

                        nodeList = e.getElementsByTagName("course");
                        if (nodeList != null && nodeList.getLength() > 0) {
                            Course[] courses = new Course[nodeList.getLength()];
                            for (int j = 0; j < nodeList.getLength(); j++) {

                                Course singleCourse = new Course();
                                HashMap classGrades = new HashMap();
                                NodeList CourseNodeList = nodeList.item(j).getChildNodes();

                                for (int k = 0; k < CourseNodeList.getLength(); k++) {
                                    if (CourseNodeList.item(k).getNodeType() == Node.ELEMENT_NODE && CourseNodeList != null) {
                                        classGrades.put(CourseNodeList.item(k).getNodeName(), CourseNodeList.item(k).getNodeValue());
                                    }
                                }
                                singleCourse.setRewards(classGrades);
                                Courses[j] = singleCourse;
                            }
                            aStudent.setCourses(Courses);
                        }
                    }
                    SchoolRoster.put(aStudent.getName(), aStudent);
                }
            }
        } else {
            System.exit(1);
        }
    } catch (Exception e) {
        System.out.println(e);
    }
    return SchoolRoster;
}

我遇到的问题是,学生在“数学”中得到“A”而不是得到“数学”中的“A”。 (如果这篇文章太长,我可以尝试找一些缩短它的方法。)

3 个答案:

答案 0 :(得分:5)

如果这是我的项目,我会避免尝试手动剖析HTML中的数据,而是让Java通过使用JAXB为我做。我使用这个工具越多,我就越喜欢它。我敦促你考虑尝试这个,因为如果你这样做,你需要将XML更改为Java对象是Java类中正确的注释,然后解组XML。使用的代码会更简单,因此更不容易出错。

例如,以下代码非常容易且干净地将信息编组到XML中:

import java.util.ArrayList;
import java.util.List;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.bind.annotation.XmlElement;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;

@XmlRootElement
public class SchoolRoster {
   @XmlElement(name = "student")
   private List<Student> students = new ArrayList<Student>();

   public SchoolRoster() {
   }

   public List<Student> getStudents() {
      return students;
   }

   public void addStudent(Student student) {
      students.add(student);
   }

   public static void main(String[] args) {
      Student john = new Student("John", 14);
      john.addCourse(new Course("math", "A"));
      john.addCourse(new Course("english", "B"));

      Student tom = new Student("Tom", 13);
      tom.addCourse(new Course("gym", "A"));
      tom.addCourse(new Course("geography", "incomplete"));

      SchoolRoster roster = new SchoolRoster();
      roster.addStudent(tom);
      roster.addStudent(john);

      try {
         JAXBContext context = JAXBContext.newInstance(SchoolRoster.class);
         Marshaller marshaller = context.createMarshaller();
         marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);

         String pathname = "MySchoolRoster.xml";
         File rosterFile = new File(pathname );
         marshaller.marshal(roster, rosterFile);
         marshaller.marshal(roster, System.out);

      } catch (JAXBException e) {
         e.printStackTrace();
      }
   }
}

@XmlRootElement
@XmlType(propOrder = { "name", "age", "courses" })
class Student {
  // TODO: completion left as an exercise for the original poster
}

@XmlRootElement
@XmlType(propOrder = { "name", "grade" })
class Course {
  // TODO: completion left as an exercise for the original poster
}

这产生了以下XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<schoolRoster>
    <student>
        <name>Tom</name>
        <age>13</age>
        <courses>
            <course>
                <name>gym</name>
                <grade>A</grade>
            </course>
            <course>
                <name>geography</name>
                <grade>incomplete</grade>
            </course>
        </courses>
    </student>
    <student>
        <name>John</name>
        <age>14</age>
        <courses>
            <course>
                <name>math</name>
                <grade>A</grade>
            </course>
            <course>
                <name>english</name>
                <grade>B</grade>
            </course>
        </courses>
    </student>
</schoolRoster>

要将其解组为一个充满数据的SchoolRoster类,只需要几行代码。

private static void unmarshallTest() {
  try {
     JAXBContext context = JAXBContext.newInstance(SchoolRoster.class);
     Unmarshaller unmarshaller = context.createUnmarshaller();

     String pathname = "MySchoolRoster.xml"; // whatever the file name should be
     File rosterFile = new File(pathname );
     SchoolRoster roster = (SchoolRoster) unmarshaller.unmarshal(rosterFile);
     System.out.println(roster);
  } catch (JAXBException e) {
     e.printStackTrace();
  }
}

toString()方法添加到我的课程后,结果是:

SchoolRoster 
  [students=
    [Student [name=Tom, age=13, courses=[Course [name=gym, grade=A], Course [name=geography, grade=incomplete]]], 
    Student [name=John, age=14, courses=[Course [name=math, grade=A], Course [name=english, grade=B]]]]]

答案 1 :(得分:4)

for (int k = 0; k < CourseNodeList.getLength(); k++) {
    if (CourseNodeList.item(k).getNodeType() == Node.ELEMENT_NODE && CourseNodeList != null) {
        classGrades.put(CourseNodeList.item(k).getNodeName(),
        CourseNodeList.item(k).getNodeValue());
    }
}

您在getNodeValue()上呼叫Element。根据JDK API文档,返回null。

http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/Node.html

您需要获取子Text节点并在其上调用getNodeValue()。这是一个非常快速和肮脏的方法:

classGrades.put(CourseNodeList.item(k).getNodeName(),
        CourseNodeList.item(k).getChildNodes().item(0).getNodeValue());

请不要在生产代码中使用它。它很丑。但它会指出你正确的方向。

答案 2 :(得分:1)

与@Hovercraft一样,我建议使用库来处理xml的序列化。我发现Xstream具有出色的性能和易用性。 http://x-stream.github.io/

例如:

public static void saveStudentsXML(FileOutputStream file) throws Exception {
    if (xstream == null)
        initXstream();

    xstream.toXML(proctorDAO.studentList, file);
    file.close(); 
 }

 public static void initXstream() {
    xstream = new XStream();
    xstream.alias("student", Student.class);

    xstream.useAttributeFor(Student.class, "lastName");
    xstream.useAttributeFor(Student.class, "firstName");
    xstream.useAttributeFor(Student.class, "id");
    xstream.useAttributeFor(Student.class, "gradYear");
    xstream.aliasAttribute(Student.class, "lastName", "last");
    xstream.aliasAttribute(Student.class, "gradYear", "gc");
    xstream.aliasAttribute(Student.class, "firstName", "first");
}

示例XML以演示嵌套属性:

<list>
  <student first="Ralf" last="Adams" gc="2014" id="100">
     <testingMods value="1" boolMod="2"/>
  </student>
  <student first="Mick" last="Agosti" gc="2014" id="102">
     <testingMods value="1" boolMod="2"/>
  </student>
  <student first="Edmund" last="Baggio" gc="2013" id="302">
     <testingMods value="1" boolMod="6"/>
  </student> 
</list>