我正在研究xml解析,并且从各种资源中学到了一些知识。我是Java领域的初学者,但我仍然想尽一切办法。
目前,我一直在尝试解析如下所示的内容:
<poem>
<line>Hey diddle, diddle
<i>the cat</i> and the fiddle.
</line>
</poem>
那不是实际的xml,但是真正的xml看起来并不差很多,所以我改为发布了(我想是相同的主意)
我正在尝试获取类似以下内容的输出:
Element : line
text : Hey diddle, diddle
element: i
text: the cat
text: and the fiddle.
------------------------
OR
------------------------
line: Hey diddle, diddle
i: the cat
and the fiddle
此刻我的代码如下:
public class parsingWithDOM {
public static void main(String[] args) {
File xml = new File("/Users.../xmlTest.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xml);
NodeList nList = doc.getElementsByTagName("line");
Node l = nList.item(0);
if (l.getNodeType() == Node.ELEMENT_NODE) {
Element line = (Element) l;
System.out.println(line.getTagName() + ": " + line.getTextContent());
NodeList lineList = line.getChildNodes();
for (int i = 0; i < lineList.getLength(); i++) {
Node node = lineList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element lineElement = (Element) node;
System.out.println(lineElement.getTagName() + ": " + lineElement.getTextContent());
}
}
}
} catch (IOException | ParserConfigurationException | DOMException | SAXException e) {
System.out.println(e.getMessage());
}
}
}
无论如何,我得到的输出是这个(不是我想要的)
line: Hey diddle, diddle the cat and the fiddle.
i: the cat
任何帮助将不胜感激?
答案 0 :(得分:1)
您可以使用getFirstChild()
,getNextSibling()
和getParentNode()
方法导航DOM树,如下所示:
a <- which(!adist(names(df1),names(df2),partial = TRUE),T)
plyr::rbind.fill(df1,setNames(df2, replace(names(df2), a[, 2], names(df1)[a[,1]])))
How.are.you.today How.were.you.yesterday How.old.are.you
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 NA
5 1 2 NA
6 1 2 NA
代码使用Java 11+ repeat(int count)
方法缩进文本。对于Java的早期版本,请使用其他机制。
输出
int level = 0;
Node node = doc.getDocumentElement();
while (node != null) {
// Process node
if (node.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(" ".repeat(level) + "Element: \"" + node.getNodeName() + "\"");
} else if (node.getNodeType() == Node.TEXT_NODE || node.getNodeType() == Node.CDATA_SECTION_NODE) {
String text = node.getNodeValue()
.replace("\r", "\\r")
.replace("\n", "\\n")
.replace("\t", "\\t");
System.out.println(" ".repeat(level) + "Text: \"" + text + "\"");
}
// Advance to next node
if (node.getFirstChild() != null) {
node = node.getFirstChild();
level++;
} else {
while (node.getNextSibling() == null && node.getParentNode() != null) {
node = node.getParentNode();
level--;
}
node = node.getNextSibling();
}
}
答案 1 :(得分:1)
许多任务在XSLT中完成比在Java / DOM中容易得多,这就是其中之一。这是使用XSLT 3.0的解决方案。
Expression<Func<ResourceSku, bool>> filterPredicate = x => x.ResourceType.Equals("virtualMachines", StringComparison.OrdinalIgnoreCase);
string filter = FilterString.Generate(filterPredicate);
IPage<ResourceSku> resourceSkus = await computeManagementClient.ResourceSkus.ListAsync(filter: filter);
输出为
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://local/"
exclude-result-prefixes="#all"
expand-text="yes"
version="3.0">
<xsl:output method="text" />
<xsl:strip-space elements="*"/>
<xsl:template match="*">
<xsl:text>{f:indent(.)}ELEMENT {name()}</xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()">
<xsl:text>{f:indent(.)}{.}</xsl:text>
</xsl:template>
<xsl:function name="f:indent" as="xs:string">
<xsl:param name="node" as="node()"/>
<xsl:sequence select="'
' || string-join((1 to count($node/ancestor::*))!'__')"/>
</xsl:function>
</xsl:stylesheet>
您可以在以下位置看到它的运行情况
https://xsltfiddle.liberty-development.net/gWEaSuR/1
与您交谈:
ELEMENT poem
__ELEMENT line
____text: Hey diddle, diddle
____ELEMENT i
______text: the cat
____text: and the fiddle.
说您要输出文本,而不是XML或HTML
xsl:output
说忽略输入中仅包含空格的文本节点
有两个xsl:strip-space
规则,一个用于元素,一个用于文本节点
这两个函数均调用函数xsl:template
,该函数根据树中节点的深度(通过计算祖先来找到)生成缩进
此样式表中的大多数工作都是使输出格式正确(输入导航会自行处理)。我在输出中使用了下划线而不是空格,因此您可以看到输入的空白与样式表生成的空白之间的区别。
JDK具有内置的XSLT 1.0处理器,但是XSLT 3.0具有许多额外的功能,为此,您需要安装Saxon。这两个处理器都可以很容易地从Java应用程序中调用。
答案 2 :(得分:-1)
下面的代码应根据您的要求进行:
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.DOMException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class ParsingWithDOM {
public static void main(String[] args) {
File xml = new File("sample.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xml);
StringBuilder sb_inner = new StringBuilder();
NodeList nList = doc.getElementsByTagName("line");
Node l = nList.item(0);
if (l.getNodeType() == Node.ELEMENT_NODE) {
Element line = (Element) l;
String outer = line.getTagName() + ": " + line.getTextContent();
NodeList lineList = line.getChildNodes();
for (int i = 0; i < lineList.getLength(); i++) {
Node node = lineList.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element lineElement = (Element) node;
sb_inner.append(lineElement.getTagName() + ": " + lineElement.getTextContent()).append("\n");
}
}
String sub = sb_inner.toString();
String []formatter = sub.split("\n");
for(int i=0; i< formatter.length; i++) {
outer = outer.replace(formatter[i].split(":")[1].trim(),
formatter[i]+"\n");
}
System.out.println(outer);
}
} catch (IOException | ParserConfigurationException | DOMException | SAXException e) {
System.out.println(e.getMessage());
}
}
}