我有以下XML文件:
<RecordSet>
<Record>
<ID>001</ID>
<TermList>
<Term>Term1</Term>
<Term>Term2</Term>
<Term>Term3</Term>
</TermList>
</Record>
<Record>
<ID>002</ID>
<TermList>
<Term>Term3</Term>
<Term>Term4</Term>
<Term>Term5</Term>
</TermList>
</Record>
</RecordSet>
并且需要将其解析为&#34; ID-Term&#34;文件,即
001 Term1
001 Term2
001 Term3
002 Term3
002 Term4
002 Term5
目前我有以下申请:
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class MedlineParser {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
try {
builder = factory.newDocumentBuilder();
doc = builder.parse("/home/andrej/Documents/test.xml");
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
try {
XPathExpression expr1 = xpath.compile("/RecordSet/Record/ID/text()");
NodeList nodes1 = (NodeList) expr1.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes1.getLength(); i++) {
String id = nodes1.item(i).getNodeValue();
XPathExpression expr2 = xpath.compile("/RecordSet/Record/TermList/Term/text()");
NodeList nodes2 = (NodeList) expr2.evaluate(doc, XPathConstants.NODESET);
for (int j = 0; j < nodes2.getLength(); j++) {
System.out.println(id + " " + nodes2.item(i).getNodeValue());
}
}
} catch (XPathExpressionException e) {
e.printStackTrace();
}
} catch (IOException | ParserConfigurationException | SAXException e) {
e.printStackTrace();
}
}
}
不幸的是,程序输出目前是:
001 Term1
001 Term1
001 Term1
001 Term1
001 Term1
001 Term1
002 Term2
002 Term2
002 Term2
002 Term2
002 Term2
002 Term2
知道XPath表达式有什么问题吗?
答案 0 :(得分:1)
两个问题:
XPath必须将第一个循环中迭代的Term
节点的索引纳入帐户。您的当前XPath每次为每个ID
节点获取所有 XPathExpression expr2 = xpath.compile("/RecordSet/Record[" + (i + 1) + "]/TermList/Term/text()");
个节点。您应该将其更改为:
for
内部j
循环中有拼写错误。您应该使用i
代替for (int j = 0; j < nodes2.getLength(); j++) {
System.out.println(id + " " + nodes2.item(j).getNodeValue());
}
:
df1
答案 1 :(得分:1)
似乎您正在打印所有ID和术语的笛卡尔积。
这会更容易:
/RecordSet/Record
选择并循环遍历所有Record节点。ID
)和术语(使用XPath Termlist/Term
)。