我有一个像这样的XML文件:
<?xml version="1.0" encoding="UTF-8"?>
<collection>
<source />
<date />
<key />
<document>
<id>AIMed_d30</id>
<passage>
<offset>0</offset>
<text>Isolation of human delta-catenin and its binding specificity with presenilin 1. We screened proteins for interaction with presenilin (PS) 1, and cloned the full-length cDNA of human delta-catenin, which encoded 1225 amino acids. Yeast two-hybrid assay, GST binding assay and immunoprecipitation demonstrated that delta-catenin interacted with a hydrophilic loop region in the endoproteolytic C-terminal fragment of PS1, but not with that of PS-2. These results suggest that PS1 and PS2 partly differ in function. PS1 loop fragment containing the pathogenic mutation retained the binding ability. We also found another armadillo-protein, p0071, interacted with PS1.</text>
<annotation id="T1">
<infon key="file">ann</infon>
<infon key="type">protein</infon>
<location length="13" offset="19" />
<text>delta-catenin</text>
</annotation>
<annotation id="T3">
<infon key="file">ann</infon>
<infon key="type">protein</infon>
<location length="17" offset="122" />
<text>presenilin (PS) 1</text>
</annotation>
<annotation id="T2">
<infon key="file">ann</infon>
<infon key="type">protein</infon>
<location length="12" offset="66" />
<text>presenilin 1</text>
</annotation>
<relation id="R4">
<infon key="relation type">Interaction</infon>
<infon key="file">ann</infon>
<infon key="type">Relation</infon>
<node role="Arg1" refid="T12" />
<node role="Arg2" refid="T13" />
</relation>
<relation id="R2">
<infon key="relation type">Interaction</infon>
<infon key="file">ann</infon>
<infon key="type">Relation</infon>
<node role="Arg1" refid="T3" />
<node role="Arg2" refid="T4" />
</relation>
<relation id="R3">
<infon key="relation type">Interaction</infon>
<infon key="file">ann</infon>
<infon key="type">Relation</infon>
<node role="Arg1" refid="T5" />
<node role="Arg2" refid="T6" />
</relation>
-
<relation id="R1">
<infon key="relation type">Interaction</infon>
<infon key="file">ann</infon>
<infon key="type">Relation</infon>
<node role="Arg1" refid="T1" />
<node role="Arg2" refid="T2" />
</relation>
</passage>
</document>
</collection>
但是当我使用DOM读取这个XML文件时,我遇到了一些问题。例如,对于annotation
标记,其中包含8个项目标记,但是当我打印结果时,它会变为10或更多。对于relation
标记,它无法正常工作。这是我的Java代码:
public class XMLRead {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException{
try{
File fXmlFile = new File("D:/THESIS/DataSet/Newfolder/Newfolder/aimed_bioc2.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("document");
System.out.println("OK----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
file1_Node nNode = nList.item(temp);
file1_
file1_System.out.println("\nCurrent Element :" + nNode.getNodeName());
file1_
file1_if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("id : " + eElement.getElementsByTagName("id").item(0).getTextContent());
// NodeList nList2 = doc.getElementsByTagName("passage");
// for(int i=0; i< nList2.getLength(); i++)
// {
System.out.println("\toffset : " + eElement.getElementsByTagName("offset").item(0).getTextContent());
System.out.println("\ttext: " + eElement.getElementsByTagName("text").item(0).getTextContent());
System.out.println("----------------------------");
NodeList nList3 = doc.getElementsByTagName("annotation");
for (int temp2 = 0; temp2 < nList3.getLength(); temp2++) {
Node nNode2 = nList3.item(temp2);tln("\n\n");
if(nNode2.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement2 = (Element) nNode2;
System.out.println("\tannotation id : " + eElement2.getAttribute("id"));
NodeList nList4=doc.getElementsByTagName("infon");
Node nNode3=nList4.item(0);
Node nNode4=nList4.item(1);
Element eElement3= (Element) nNode3;
Element eElement4= (Element) nNode4;
System.out.println("\t\tinfon key : " + eElement3.getAttribute("key")
+", infon : " +eElement.getElementsByTagName("infon").item(0).getTextContent());
System.out.println("\t\tinfon key : " + eElement4.getAttribute("key")
+ ", infon : " +eElement.getElementsByTagName("infon").item(1).getTextContent());
NodeList nList5 = doc.getElementsByTagName("location");
Node nNode5=nList5.item(temp2);
Element eElement5=(Element) nNode5;
System.out.println("\t\tLocation Lenght : " +eElement5.getAttribute("length")
+" ,Location offset : " + eElement5.getAttribute("offset"));
System.out.println("\t\tannotation text : "+ eElement2.getElementsByTagName("text").item(0).getTextContent());
}
}
System.out.println("----------------------------");
NodeList nList6 = doc.getElementsByTagName("relation");
for (int temp3 = 0; temp3 < nList6.getLength(); temp3++) {
Node nNode6 = nList6.item(temp3);tln("\n\n");
if(nNode6.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement6 = (Element) nNode6;
System.out.println("\tRelation id : " + eElement6.getAttribute("id"));
Node nNode14=nList6.item(0);
Element eElement14=(Element) nNode14;
NodeList nList7=doc.getElementsByTagName("infon");
for(int temp5 = 0; temp5<nList7.getLength(); temp5++){
Node nNode7=nList7.item(temp5);
Node nNode8=nList7.item(1);
Node nNode9=nList7.item(2);
Element eElement7= (Element) nNode7;
Element eElement8= (Element) nNode8;
Element eElement9= (Element) nNode9;
System.out.println("\t\tinfon key : " + eElement7.getAttribute("key")
+" ,infon : " +eElement6.getElementsByTagName("infon").item(0).getTextContent());}
System.out.println("\n\n");
NodeList nList8 = doc.getElementsByTagName("node");
for(int temp4=0; temp4<nList8.getLength(); temp4++)
{
Node nNode12 = nList8.item(temp4);
Element eElement12 = (Element) nNode12;
System.out.println("\t\tNode Role : " +eElement12.getAttribute("role")
+" ,refid : " + eElement12.getAttribute("refid"));
}
}
}
}
// }
}
}
catch (IOException e) {
e.printStackTrace();
}
}
}
答案 0 :(得分:0)
当您的程序处理子元素时,doc.getElementsByTagName(tagname)
将返回整个文档的匹配节点。如果这不是您想要的,您应该检查代码并修复它。例如:
NodeList nList6 = doc.getElementsByTagName("relation");
for (int temp3 = 0; temp3 < nList6.getLength(); temp3++) {
Node nNode6 = nList6.item(temp3);
System.out.println("\n\n");
if(nNode6.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement6 = (Element) nNode6;
System.out.println("\tRelation id : " + eElement6.getAttribute("id"));
Node nNode14=nList6.item(0);
Element eElement14=(Element) nNode14;
//NodeList nList7=doc.getElementsByTagName("infon");
//Correct call for getting all descendant elements of *eElement6*
NodeList nList7=eElement6.getElementsByTagName("infon");
//...
}
}
}
答案 1 :(得分:0)
这种用法很好地证明了org.w3c.dom。*代码的使用如何很快变得难以阅读。解析类型化的类结构是避免它的一种方法。
或者,如果您可以使用Java 8,我的库Dynamics可以提供更易读的方式,以及null-safety&amp;更具描述性的错误报告。
File fXmlFile = new File("D:/THESIS/DataSet/Newfolder/Newfolder/aimed_bioc2.xml");
XmlDynamic xml = new XmlDynamic(new FileReader(fXmlFile));
xml.get("collection").children()
.filter(hasElementName("document"))
.forEach(document -> {
System.out.println("id : " + document.get("id").asString());
System.out.println("\toffset : " + document.get("passage|offset").asString());
System.out.println("\ttext: " + document.get("passage|text").asString());
System.out.println("----------------------------");
Dynamic passage = document.get("passage");
passage.children()
.filter(hasElementName("annotation"))
.forEach(annotation -> {
System.out.println("\tannotation id : " + annotation.get("id").asString());
annotation.children()
.filter(hasElementName("infon"))
.forEach(infon -> {
System.out.printf("\t\tinfon key : %s, infon : %s%n",
infon.get("key").asString(), infon.asString());
});
System.out.printf("\t\tlocation Length : %s, location offset : %s%n",
annotation.get("location|length").asString(), annotation.get("location|offset").asString());
System.out.println("\t\tannotation text : "+ annotation.get("text").asString());
});
System.out.println("----------------------------");
passage.children()
.filter(hasElementName("relation"))
.forEach(relation -> {
System.out.println("\trelation id : " + relation.get("id").asString());
relation.children()
.filter(hasElementName("infon"))
.forEach(infon -> {
System.out.printf("\t\tinfon key : %s, infon : %s%n",
infon.get("key").asString(), infon.asString());
});
relation.children()
.filter(hasElementName("node"))
.forEach(node -> {
System.out.printf("\t\tnode role : %s, refid : %s%n",
node.get("role").asString(), node.get("refid").asString());
});
});
});
输出:
id : AIMed_d30
offset : 0
text: Isolation of human delta-catenin and its binding specificity with presenilin 1. We screened proteins for interaction with presenilin (PS) 1, and cloned the full-length cDNA of human delta-catenin, which encoded 1225 amino acids. Yeast two-hybrid assay, GST binding assay and immunoprecipitation demonstrated that delta-catenin interacted with a hydrophilic loop region in the endoproteolytic C-terminal fragment of PS1, but not with that of PS-2. These results suggest that PS1 and PS2 partly differ in function. PS1 loop fragment containing the pathogenic mutation retained the binding ability. We also found another armadillo-protein, p0071, interacted with PS1.
----------------------------
annotation id : T1
infon key : file, infon : ann
infon key : type, infon : protein
location Length : 13, location offset : 19
annotation text : delta-catenin
annotation id : T3
infon key : file, infon : ann
infon key : type, infon : protein
location Length : 17, location offset : 122
annotation text : presenilin (PS) 1
annotation id : T2
infon key : file, infon : ann
infon key : type, infon : protein
location Length : 12, location offset : 66
annotation text : presenilin 1
----------------------------
relation id : R4
infon key : relation type, infon : Interaction
infon key : file, infon : ann
infon key : type, infon : Relation
node role : Arg1, refid : T12
node role : Arg2, refid : T13
relation id : R2
infon key : relation type, infon : Interaction
infon key : file, infon : ann
infon key : type, infon : Relation
node role : Arg1, refid : T3
node role : Arg2, refid : T4
relation id : R3
infon key : relation type, infon : Interaction
infon key : file, infon : ann
infon key : type, infon : Relation
node role : Arg1, refid : T5
node role : Arg2, refid : T6
relation id : R1
infon key : relation type, infon : Interaction
infon key : file, infon : ann
infon key : type, infon : Relation
node role : Arg1, refid : T1
node role : Arg2, refid : T2