我有一个带有以下架构的XML,我想要检索左右两边的文本(使用JAVA + DOM4j)
<article>
<article-meta></article-meta>
<body>
<p>
Extensible Markup Language (XML) is a markup language that defines a set of
rules for encoding documents in a format that is both human-readable and machine-
readable <ref id = 1>1</ref>. It is defined in the XML 1.0 Specification produced
by the W3C, and several other related specifications
</p>
<p>
Many application programming interfaces (APIs) have been developed to aid
software developers with processing XML <ref id = 2>2</ref>. data, and several schema
systems exist to aid in the definition of XML-based languages.
</p>
</body>
</article>
我想检索标签周围的文字。例如,这个XML将是
<ref id = 1>1</ref>
左:人类可读和机器 - 可读
右:它在XML 1.0规范中定义
答案 0 :(得分:0)
尝试
import java.util.List;
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
public class TestDom4j {
public static Document getDocument(final String xmlFileName) {
Document document = null;
SAXReader reader = new SAXReader();
try {
document = reader.read(xmlFileName);
} catch (DocumentException e) {
e.printStackTrace();
}
return document;
}
/**
* @param args
*/
public static void main(String[] args) {
String xmlFileName = "data.xml";
String xPath = "//article/body/p";
Document document = getDocument(xmlFileName);
List<Node> nodes = document.selectNodes(xPath);
for (Node node : nodes) {
String nodeXml = node.asXML();
System.out.println("Left >> " + nodeXml.substring(3, nodeXml.indexOf("<ref")).trim());
System.out.println("Right >> " + nodeXml.substring(nodeXml.indexOf("</ref>") + 6, nodeXml.length() - 4).trim());
}
}
}