我正在尝试更改大型(5mb)XML文件中的单个值。我总是知道值将在前10行中,因此我不需要读取99%的文件。然而,似乎在Java中进行部分XML读取非常棘手。
我已经阅读了很多关于Java中的XML以及处理它的最佳实践。但是,在这种情况下,我不确定最好的方法是什么 - DOM,STAX或SAX解析器似乎都有不同的最佳用例场景 - 我不确定哪种最适合这个问题。因为我需要做的就是编辑一个值。
也许,我甚至不应该使用XML解析器,只使用正则表达式,但它看起来像是pretty bad idea to use regex on XML
希望有人能指出我正确的方向, 谢谢!
答案 0 :(得分:2)
我会选择DOM而非SAX或StAX只是为了(相对)简单的API。是的,有一些样板代码可以填充DOM,但是一旦过了它就会很简单。
话虽如此,如果您的XML源是100或1000兆字节,其中一个流API将更适合。事实上,5MB不是我认为的大数据集,所以继续使用DOM并称之为一天:
import java.io.File;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class ChangeVersion
{
public static void main(String[] args)
throws Exception
{
if (args.length < 3) {
System.err.println("Usage: ChangeVersion <input> <output> <new version>");
System.exit(1);
}
File inputFile = new File(args[0]);
File outputFile = new File(args[1]);
int updatedVersion = Integer.parseInt(args[2], 10);
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = domFactory.newDocumentBuilder();
Document doc = docBuilder.parse(inputFile);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
XPathExpression expr = xpath.compile("/PremiereData/Project/@Version");
NodeList versionAttrNodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < versionAttrNodes.getLength(); i++) {
Attr versionAttr = (Attr) versionAttrNodes.item(i);
versionAttr.setNodeValue(String.valueOf(updatedVersion));
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(new DOMSource(doc), new StreamResult(outputFile));
}
}
答案 1 :(得分:2)
您可以在阅读时使用StAX解析器编写XML。执行此操作时,您可以在解析时替换内容。在任何给定时间,使用StAX解析器只会在内存中包含部分xml。
public static void main(String [] args) throws Exception {
final String newProjectId = "888";
File inputFile = new File("in.xml");
File outputFile = new File("out.xml");
System.out.println("Reading " + inputFile);
System.out.println("Writing " + outputFile);
XMLInputFactory inFactory = XMLInputFactory.newInstance();
XMLEventReader eventReader = inFactory.createXMLEventReader(new FileInputStream(inputFile));
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLEventWriter writer = factory.createXMLEventWriter(new FileWriter(outputFile));
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
boolean useExistingEvent; // specifies if we should use the event right from the reader
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
useExistingEvent = true;
// look for our Project element
if(event.getEventType() == XMLEvent.START_ELEMENT) {
// read characters
StartElement elemEvent = event.asStartElement();
Attribute attr = elemEvent.getAttributeByName(QName.valueOf("ObjectID"));
// check to see if this is the project we want
// TODO: put what logic you want here
if("Project".equals(elemEvent.getName().getLocalPart()) && attr != null && attr.getValue().equals("1")) {
Attribute versionAttr = elemEvent.getAttributeByName(QName.valueOf("Version"));
// we need to make a list of new attributes for this element which doesnt include the Version a
List<Attribute> newAttrs = new ArrayList<>(); // new list of attrs
Iterator<Attribute> existingAttrs = elemEvent.getAttributes();
while(existingAttrs.hasNext()) {
Attribute existing = existingAttrs.next();
// copy over everything but version attribute
if(!existing.getName().getLocalPart().equals("Version")) {
newAttrs.add(existing);
}
}
// add our new attribute for projectId
newAttrs.add(eventFactory.createAttribute(versionAttr.getName(), newProjectId));
// were using our own event instead of the existing one
useExistingEvent = false;
writer.add(eventFactory.createStartElement(elemEvent.getName(), newAttrs.iterator(), elemEvent.getNamespaces()));
}
}
// persist the existing event.
if(useExistingEvent) {
writer.add(event);
}
}
writer.close();
}