我需要使用VTD XML和XPath读取大型xml,并将结果拆分为多个节点。 我找到了一些解决方案here,但它拆分节点但没有父母信息。
为什么我要找:
XPath字符串: / CATALOG / MAIN / CD 基于XPath的文档应该被拆分
1)初始文件:
<CATALOG>
<MAIN id="1">
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
</CD>
<CD>
<TITLE>Empire Dummy</TITLE>
<ARTIST>John Doe</ARTIST>
</CD>
<USEFUL>Useful node</USEFUL>
</MAIN>
<MAIN id="2">
<CD>
<TITLE>Still got the blues</TITLE>
<ARTIST>Gary More</ARTIST>
</CD>
</MAIN>
<IGNORED>Ignored node</IGNORED>
</CATALOG>
2)结果: 文件1:
<CATALOG>
<MAIN id="1">
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
</CD>
<USEFUL>Useful node</USEFUL>
</MAIN>
</CATALOG>
文件2:
<CATALOG>
<MAIN id="1">
<CD>
<TITLE>Empire Dummy</TITLE>
<ARTIST>John Doe</ARTIST>
</CD>
<USEFUL>Useful node</USEFUL>
</MAIN>
</CATALOG>
文件3:
<CATALOG>
<MAIN id="2">
<CD>
<TITLE>Still got the blues</TITLE>
<ARTIST>Gary More</ARTIST>
</CD>
</MAIN>
</CATALOG>
感谢您的时间和建议。
祝你好运!
答案 0 :(得分:1)
以下是执行vtd-xml中描述的代码。如果有任何问题,请告诉我。
import com.ximpleware.*;
import java.io.FileOutputStream;
public class splitTest {
public static void main(String[] a) throws VTDException,java.io.IOException{
VTDGen vg = new VTDGen();
if (vg.parseFile("C:\\Users\\Jimmy Zhang\\workspace\\ximple-dev\\DOMTest\\test111.xml", false)){
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/CATALOG/MAIN");
byte[] header = "<CATALOG>".getBytes();
byte[] tail = "</CATALOG>".getBytes();
int i = -1,j=0;
while((i=ap.evalXPath())!=-1){
long l = vn.getElementFragment();
FileOutputStream fops = new FileOutputStream("c:\\xml\\output"+j+".xml");
fops.write(header);
fops.write(vn.getXML().getBytes(), (int)l, ((int)(l>>32)));
fops.write(tail);
fops.close();
j++;
}
}
}
}
答案 1 :(得分:0)
Herez我的方法......
使用一些xml解析库,例如javax.xml.parsers.DocumentBuilderFactory 为输入的xml文件创建一个DOM ... 对于遇到的每个节点,创建一个新的输出文件Document.xml,例如Document1.xml(在父节点下添加子节点。
您可以使用java.xml.parsers。* package(查找loadQuestions示例)来查询http://www.programcreek.com/java-api-examples/index.php?api=javax.xml.parsers.DocumentBuilderFactory示例java代码以解析xml
答案 2 :(得分:0)
我解决了我的问题。这是我基于标准SAX解析的方法。
1)创建自定义SaxHandler:
` 公共类CustomSAXHandler扩展了DefaultHandler {
private Stack<XmlNodeInfo> nodeStack = new Stack<XmlNodeInfo>();
private List<String> xPaths;
private XmlNodeInfo rootNode;
private final NamespaceContext namespaceContext;
private List<XmlNodeInfo> resultNodes;
public CustomSAXHandler(String xpath, XmlNodeInfo rootNode, NamespaceContext namespaceContext) {
this.rootNode = rootNode;
this.namespaceContext = namespaceContext;
resultNodes = new ArrayList<XmlNodeInfo>();
xPaths = splitXpaths(xpath);
}
@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
String element = "<" + qName + getAttributes(atts) + ">";
if (!nodeStack.empty()) {
rootNode = nodeStack.peek();
}
if (matchDefinedXpath(qName)) {
XmlNodeInfo newNode = new XmlNodeInfo(qName);
rootNode.addChild(newNode);
nodeStack.push(newNode);
newNode.getHeader().append(element);
} else {
if (!nodeStack.empty()) {
nodeStack.peek().getBody().append(element);
}
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
XmlNodeInfo currentNode = nodeStack.empty() ? null : nodeStack.peek();
if (currentNode != null) {
currentNode.getBody().append(new String(ch, start, length));
}
}
@Override
public void endElement(String namespaceURI, String localName, String qName) throws SAXException {
String finalElement = xPaths.get(xPaths.size() - 1);
String element = "</" + qName + ">";
XmlNodeInfo currentNode = nodeStack.empty() ? null : nodeStack.peek();
if (currentNode != null) {
if (qName.equals(finalElement) && nodeStack.size() == xPaths.size()) {
currentNode.getFooter().append(element);
resultNodes.add(currentNode);
nodeStack.pop();
} else {
if (currentNode.getName().equals(qName)) {
currentNode.getFooter().append(element);
nodeStack.pop();
} else {
currentNode.getBody().append(element);
}
}
}
}
public List<String> getResults() {
List<String> results = new ArrayList<String>();
for (XmlNodeInfo node : resultNodes) {
buildDocument(node, null, results);
}
return results;
}
private void buildDocument(XmlNodeInfo node, String childContent, List<String> results) {
String body = node.getBody().toString();
if (childContent != null) {
body = body + childContent;
}
if (node.getParent() != null && !node.getParent().getName().equals(XmlNodeInfo.ROOT_NODE_NAME)) {
String xmlContent = String.valueOf(node.getHeader()) + body + node.getFooter();
buildDocument(node.getParent(), xmlContent, results);
} else if (node.getParent() != null && node.getParent().getName().equals(XmlNodeInfo.ROOT_NODE_NAME)) {
String finalContent = String.valueOf(node.getHeader()) + body + node.getFooter();
results.add(finalContent);
}
}
private String getAttributes(Attributes atts) {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < atts.getLength(); i++) {
String qName = atts.getQName(i);
String value = atts.getValue(qName);
builder.append(" ").append(qName).append("=").append("\"").append(value).append("\"");
}
return builder.toString();
}
private boolean matchDefinedXpath(String nodeName) {
String[] splitWords = nodeName.split(":");
if (splitWords.length == 2) {
String namespacePrefix = splitWords[0];
String namespaceURI = namespaceContext.getNamespaceURI(namespacePrefix);
Iterator prefixes = namespaceContext.getPrefixes(namespaceURI);
while (prefixes.hasNext()) {
String prefix = (String) prefixes.next();
String elementName = prefix + ":" + splitWords[1];
if (xPaths.contains(elementName)) {
return true;
}
}
} else {
return xPaths.contains(nodeName);
}
return false;
}
private List<String> splitXpaths(String xPath) {
if (StringUtils.isNotBlank(xPath)) {
String[] splitWords = xPath.split("/");
if (splitWords.length > 0) {
List<String> results = new ArrayList<String>();
for (String splitWord : splitWords) {
if(StringUtils.isNotBlank(splitWord)){
results.add(splitWord);
}
}
return results;
}
}
return null;
}
}
`
2)创建一个bean来存储节点数据:
`
public class XmlNodeInfo {
public static final String ROOT_NODE_NAME = "ROOT";
private String name;
private StringBuilder header;
private StringBuilder body;
private StringBuilder footer;
private List<XmlNodeInfo> children;
private XmlNodeInfo parent;
public XmlNodeInfo(String name) {
this.name = name;
header = new StringBuilder();
body = new StringBuilder();
footer = new StringBuilder();
children = new ArrayList<XmlNodeInfo>();
}
public StringBuilder getHeader() {
return header;
}
public StringBuilder getBody() {
return body;
}
public StringBuilder getFooter() {
return footer;
}
public List<XmlNodeInfo> getChildren() {
return children;
}
public void addChild(XmlNodeInfo xmlNodeInfo) {
children.add(xmlNodeInfo);
xmlNodeInfo.setParent(this);
}
public String getName() {
return name;
}
public XmlNodeInfo getParent() {
return parent;
}
public void setParent(XmlNodeInfo parent) {
this.parent = parent;
}
}
`
3)运行程序:
` 公共类MainApp {
public static void main(String[] args) throws Exception {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
NamespaceContext namespaceContext = new XmlNamespaceResolver();
String xPath = "/CATALOG/MAIN/CD";
InputStream in = MainApp.class.getClassLoader().getResourceAsStream("test.xml");
XmlNodeInfo rootNode = new XmlNodeInfo(XmlNodeInfo.ROOT_NODE_NAME);
CustomSAXHandler customSAXHandler = new CustomSAXHandler(xPath, rootNode, namespaceContext);
saxParser.parse(in, customSAXHandler);
List<String> results = customSAXHandler.getResults(); // result strings
}
}
`
也许不是最好的解决方案,但它可以解决我的问题。感谢您的所有时间和建议。