我多年来一直在解析这样的XML,我不得不承认,当不同元素的数量变得越来越大时,我发现它有点无聊而且筋疲力尽,这就是我的意思,样本虚拟XML:
<?xml version="1.0"?>
<Order>
<Date>2003/07/04</Date>
<CustomerId>123</CustomerId>
<CustomerName>Acme Alpha</CustomerName>
<Item>
<ItemId> 987</ItemId>
<ItemName>Coupler</ItemName>
<Quantity>5</Quantity>
</Item>
<Item>
<ItemId>654</ItemId>
<ItemName>Connector</ItemName>
<Quantity unit="12">3</Quantity>
</Item>
<Item>
<ItemId>579</ItemId>
<ItemName>Clasp</ItemName>
<Quantity>1</Quantity>
</Item>
</Order>
这是相关部分(使用sax):
public class SaxParser extends DefaultHandler {
boolean isItem = false;
boolean isOrder = false;
boolean isDate = false;
boolean isCustomerId = false;
private Order order;
private Item item;
@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
if (localName.equalsIgnoreCase("ORDER")) {
order = new Order();
}
if (localName.equalsIgnoreCase("DATE")) {
isDate = true;
}
if (localName.equalsIgnoreCase("CUSTOMERID")) {
isCustomerId = true;
}
if (localName.equalsIgnoreCase("ITEM")) {
isItem = true;
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (isDate){
SimpleDateFormat formatter = new SimpleDateFormat("yyyy/MM/dd");
String value = new String(ch, start, length);
try {
order.setDate(formatter.parse(value));
} catch (ParseException e) {
e.printStackTrace();
}
}
if(isCustomerId){
order.setCustomerId(Integer.valueOf(new String(ch, start, length)));
}
if (isItem) {
item = new Item();
isItem = false;
}
}
}
我想知道有没有办法摆脱这些随着元素数量不断增长的丑陋布尔。必须有一种更好的方法来解析这个相对简单的xml。只是通过查看执行此任务所需的代码行看起来很丑陋。
目前我正在使用SAX解析器,但我对任何其他建议持开放态度(除了DOM,我在内存解析器中无法承受,我有大量的XML文件)。
答案 0 :(得分:6)
如果您控制XML的定义,则可以使用XML绑定工具,例如 JAXB (用于XML绑定的Java体系结构)。在JAXB中,您可以为XML结构定义模式(支持XSD和其他人)或注释您的Java类以定义序列化规则。一旦在XML和Java之间有清晰的声明性映射,就可以轻松地对XML进行编组和解组。
使用JAXB确实需要比SAX处理程序更多的内存,但是存在按部分处理XML文档的方法:Dealing with large documents。
答案 1 :(得分:5)
以下是使用JAXB和StAX的示例。
输入文件:
<?xml version="1.0" encoding="UTF-8"?>
<Personlist xmlns="http://example.org">
<Person>
<Name>Name 1</Name>
<Address>
<StreetAddress>Somestreet</StreetAddress>
<PostalCode>00001</PostalCode>
<CountryName>Finland</CountryName>
</Address>
</Person>
<Person>
<Name>Name 2</Name>
<Address>
<StreetAddress>Someotherstreet</StreetAddress>
<PostalCode>43400</PostalCode>
<CountryName>Sweden</CountryName>
</Address>
</Person>
</Personlist>
Person.java:
@XmlRootElement(name = "Person", namespace = "http://example.org")
public class Person {
@XmlElement(name = "Name", namespace = "http://example.org")
private String name;
@XmlElement(name = "Address", namespace = "http://example.org")
private Address address;
public String getName() {
return name;
}
public Address getAddress() {
return address;
}
}
Address.java:
public class Address {
@XmlElement(name = "StreetAddress", namespace = "http://example.org")
private String streetAddress;
@XmlElement(name = "PostalCode", namespace = "http://example.org")
private String postalCode;
@XmlElement(name = "CountryName", namespace = "http://example.org")
private String countryName;
public String getStreetAddress() {
return streetAddress;
}
public String getPostalCode() {
return postalCode;
}
public String getCountryName() {
return countryName;
}
}
PersonlistProcessor.java:
public class PersonlistProcessor {
public static void main(String[] args) throws Exception {
new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
.getResourceAsStream("personlist.xml"));
}
// TODO: Instead of throws Exception, all exceptions should be wrapped
// inside runtime exception
public void processPersonlist(InputStream inputStream) throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
// Create unmarshaller
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
// Go to next tag
xss.nextTag();
// Require Personlist
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
// Go to next tag
while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
// Require Person
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
// Unmarshall person
Person person = (Person)unmarshaller.unmarshal(xss);
// Process person
processPerson(person);
}
// Require Personlist
xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
}
private void processPerson(Person person) {
System.out.println(person.getName());
System.out.println(person.getAddress().getCountryName());
}
}
答案 2 :(得分:0)
在SAX中,解析器在您的处理程序中“推送”事件,因此您必须像在此处习惯一样完成所有内务处理。另一种选择是StAX(javax.xml.stream
包),它仍然是流式传输,但您的代码负责从解析器中“拉”事件。这样,在程序的控制流程中编码的顺序是什么元素的逻辑,而不是必须在布尔值中明确表示。
根据XML的精确结构,可能存在使用像XOM这样的工具包的“中间方式”,它具有一种操作模式,您可以将文档的子树解析为类似DOM的对象模型处理那个树枝,然后扔掉它并解析下一个。这对于具有许多类似元素的重复文档非常有用,每个元素都可以单独处理 - 您可以轻松地在每个树枝内编程到基于树的API,但仍然具有流式行为,可以让您有效地解析大型文档。
public class ItemProcessor extends NodeFactory {
private Nodes emptyNodes = new Nodes();
public Nodes finishMakingElement(Element elt) {
if("Item".equals(elt.getLocalName())) {
// process the Item element here
System.out.println(elt.getFirstChildElement("ItemId").getValue()
+ ": " + elt.getFirstChildElement("ItemName").getValue());
// then throw it away
return emptyNodes;
} else {
return super.finishMakingElement(elt);
}
}
}
使用StAX和JAXB的组合可以实现类似的功能 - 定义表示重复元素的JAXB注释类(本例中为Item),然后创建StAX解析器,导航到第一个Item
开始标记,然后您可以从Item
一次解组一个完整的XMLStreamReader
。
答案 3 :(得分:0)
我一直在使用xsteam将我自己的对象序列化为xml,然后将它们作为Java对象加载回来。如果您可以将每个标记表示为POJO,并且您正确地注释POJO以匹配xml文件中的类型,您可能会发现它更容易使用。
当String表示XML中的对象时,您只需编写:
Order theOrder = (Order)xstream.fromXML(xmlString);
我一直用它来将一个对象加载到内存中,但是如果你需要流式传输并处理,你应该可以使用HierarchicalStreamReader来遍历文档。这可能与@Dave建议的Simple非常相似。
答案 4 :(得分:0)
正如其他人所说,Stax模型是一种更好的方法来减少内存占用,因为它是基于推送的模型。我个人使用了Axio(在Apache Axis中使用)并使用XPath表达式解析元素,这比通过节点元素更简洁,就像你在提供的代码片段中所做的那样。
答案 5 :(得分:0)
我一直在使用这个库。它位于标准Java库的顶部,使我更容易。特别是,您可以按名称请求特定元素或属性,而不是使用您所描述的大“if”语句。
http://marketmovers.blogspot.com/2014/02/the-easy-way-to-read-xml-in-java.html
答案 6 :(得分:0)
还有另一个库支持更紧凑的XML解析RTXML。该库及其文档位于rasmustorkel.com。我在原始问题中实现了文件的解析,我在这里包含完整的程序:
package for_so;
import java.io.File;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import rasmus_torkel.xml_basic.read.TagNode;
import rasmus_torkel.xml_basic.read.XmlReadOptions;
import rasmus_torkel.xml_basic.read.impl.XmlReader;
public class Q15626686_ReadOrder
{
public static class Order
{
public final Date _date;
public final int _customerId;
public final String _customerName;
public final ArrayList<Item> _itemAl;
public
Order(TagNode node)
{
_date = (Date)node.nextStringMappedFieldE("Date", Date.class);
_customerId = (int)node.nextIntFieldE("CustomerId");
_customerName = node.nextTextFieldE("CustomerName");
_itemAl = new ArrayList<Item>();
boolean finished = false;
while (!finished)
{
TagNode itemNode = node.nextChildN("Item");
if (itemNode != null)
{
Item item = new Item(itemNode);
_itemAl.add(item);
}
else
{
finished = true;
}
}
node.verifyNoMoreChildren();
}
}
public static final Pattern DATE_PATTERN = Pattern.compile("^(\\d\\d\\d\\d)\\/(\\d\\d)\\/(\\d\\d)$");
public static class Date
{
public final String _dateString;
public final int _year;
public final int _month;
public final int _day;
public
Date(String dateString)
{
_dateString = dateString;
Matcher matcher = DATE_PATTERN.matcher(dateString);
if (!matcher.matches())
{
throw new RuntimeException(dateString + " does not match pattern " + DATE_PATTERN.pattern());
}
_year = Integer.parseInt(matcher.group(1));
_month = Integer.parseInt(matcher.group(2));
_day = Integer.parseInt(matcher.group(3));
}
}
public static class Item
{
public final int _itemId;
public final String _itemName;
public final Quantity _quantity;
public
Item(TagNode node)
{
_itemId = node.nextIntFieldE("ItemId");
_itemName = node.nextTextFieldE("ItemName");
_quantity = new Quantity(node.nextChildE("Quantity"));
node.verifyNoMoreChildren();
}
}
public static class Quantity
{
public final int _unitSize;
public final int _unitQuantity;
public
Quantity(TagNode node)
{
_unitSize = node.attributeIntD("unit", 1);
_unitQuantity = node.onlyInt();
}
}
public static void
main(String[] args)
{
File xmlFile = new File(args[0]);
TagNode orderNode = XmlReader.xmlFileToRoot(xmlFile, "Order", XmlReadOptions.DEFAULT);
Order order = new Order(orderNode);
System.out.println("Read order for " + order._customerName + " which has " + order._itemAl.size() + " items");
}
}
您会注意到检索功能以N,E或D结尾。它们指的是当所需数据项不存在时该怎么做。 N代表返回Null,E代表抛出异常,D代表使用默认值。
答案 7 :(得分:0)
解决方案,不使用外部包,甚至XPath:使用enum
&#34; PARSE_MODE&#34;,可能与Stack<PARSE_MODE>
结合使用:
1)基本解决方案:
a)字段
private PARSE_MODE parseMode = PARSE_MODE.__UNDEFINED__;
// NB: essential that all these enum values are upper case, but this is the convention anyway
private enum PARSE_MODE {
__UNDEFINED__, ORDER, DATE, CUSTOMERID, ITEM };
private List<String> parseModeStrings = new ArrayList<String>();
private Stack<PARSE_MODE> modeBreadcrumbs = new Stack<PARSE_MODE>();
b)制作你的List<String>
,也许在构造函数中:
for( PARSE_MODE pm : PARSE_MODE.values() ){
// might want to check here that these are indeed upper case
parseModeStrings.add( pm.name() );
}
c)startElement
和endElement
:
@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
String localNameUC = localName.toUpperCase();
// pushing "__UNDEFINED__" would mess things up! But unlikely name for an XML element
assert ! localNameUC.equals( "__UNDEFINED__" );
if( parseModeStrings.contains( localNameUC )){
parseMode = PARSE_MODE.valueOf( localNameUC );
// any "policing" to do with which modes are allowed to switch into
// other modes could be put here...
// in your case, go `new Order()` here when parseMode == ORDER
modeBreadcrumbs.push( parseMode );
}
else {
// typically ignore the start of this element...
}
}
@Override
private void endElement(String uri, String localName, String qName) throws Exception {
String localNameUC = localName.toUpperCase();
if( parseModeStrings.contains( localNameUC )){
// will not fail unless XML structure which is malformed in some way
// or coding error in use of the Stack, etc.:
assert modeBreadcrumbs.pop() == parseMode;
if( modeBreadcrumbs.empty() ){
parseMode = PARSE_MODE.__UNDEFINED__;
}
else {
parseMode = modeBreadcrumbs.peek();
}
}
else {
// typically ignore the end of this element...
}
}
......那么这一切意味着什么呢?在任何时候你都知道&#34;解析模式&#34;如果您需要了解您通过的其他解析模式,您还可以查看Stack<PARSE_MODE> modeBreadcrumbs
...
然后您的characters
方法变得更加清洁:
public void characters(char[] ch, int start, int length) throws SAXException {
switch( parseMode ){
case DATE:
// PS - this SimpleDateFormat object can be a field: it doesn't need to be created hundreds of times
SimpleDateFormat formatter. ...
String value = ...
...
break;
case CUSTOMERID:
order.setCustomerId( ...
break;
case ITEM:
item = new Item();
// this next line probably won't be needed: when you get to endElement, if
// parseMode is ITEM, the previous mode will be restored automatically
// isItem = false ;
}
}
2)更多&#34;专业&#34;解决方案:
abstract
类哪些具体类必须扩展,然后无法修改Stack
等。注意,这会检查qName
而不是localName
。因此:
public abstract class AbstractSAXHandler extends DefaultHandler {
protected enum PARSE_MODE implements SAXHandlerParseMode {
__UNDEFINED__
};
// abstract: the concrete subclasses must populate...
abstract protected Collection<Enum<?>> getPossibleModes();
//
private Stack<SAXHandlerParseMode> modeBreadcrumbs = new Stack<SAXHandlerParseMode>();
private Collection<Enum<?>> possibleModes;
private Map<String, Enum<?>> nameToEnumMap;
private Map<String, Enum<?>> getNameToEnumMap(){
// lazy creation and population of map
if( nameToEnumMap == null ){
if( possibleModes == null ){
possibleModes = getPossibleModes();
}
nameToEnumMap = new HashMap<String, Enum<?>>();
for( Enum<?> possibleMode : possibleModes ){
nameToEnumMap.put( possibleMode.name(), possibleMode );
}
}
return nameToEnumMap;
}
protected boolean isLegitimateModeName( String name ){
return getNameToEnumMap().containsKey( name );
}
protected SAXHandlerParseMode getParseMode() {
return modeBreadcrumbs.isEmpty()? PARSE_MODE.__UNDEFINED__ : modeBreadcrumbs.peek();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
try {
_startElement(uri, localName, qName, attributes);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses (NB I think caught Exceptions are not a brilliant design choice in Java)
protected void _startElement(String uri, String localName, String qName, Attributes attributes)
throws Exception {
String qNameUC = qName.toUpperCase();
// very undesirable ever to push "UNDEFINED"! But unlikely name for an XML element
assert !qNameUC.equals("__UNDEFINED__") : "Encountered XML element with qName \"__UNDEFINED__\"!";
if( getNameToEnumMap().containsKey( qNameUC )){
Enum<?> newMode = getNameToEnumMap().get( qNameUC );
modeBreadcrumbs.push( (SAXHandlerParseMode)newMode );
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
try {
_endElement(uri, localName, qName);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses
protected void _endElement(String uri, String localName, String qName) throws Exception {
String qNameUC = qName.toUpperCase();
if( getNameToEnumMap().containsKey( qNameUC )){
modeBreadcrumbs.pop();
}
}
public List<?> showModeBreadcrumbs(){
return org.apache.commons.collections4.ListUtils.unmodifiableList( modeBreadcrumbs );
}
}
interface SAXHandlerParseMode {
}
然后,具体子类的突出部分:
private enum PARSE_MODE implements SAXHandlerParseMode {
ORDER, DATE, CUSTOMERID, ITEM
};
private Collection<Enum<?>> possibleModes;
@Override
protected Collection<Enum<?>> getPossibleModes() {
// lazy initiation
if (possibleModes == null) {
List<SAXHandlerParseMode> parseModes = new ArrayList<SAXHandlerParseMode>( Arrays.asList(PARSE_MODE.values()) );
possibleModes = new ArrayList<Enum<?>>();
for( SAXHandlerParseMode parseMode : parseModes ){
possibleModes.add( PARSE_MODE.valueOf( parseMode.toString() ));
}
// __UNDEFINED__ mode (from abstract superclass) must be added afterwards
possibleModes.add( AbstractSAXHandler.PARSE_MODE.__UNDEFINED__ );
}
return possibleModes;
}
PS这是更复杂内容的起点:例如,您可以设置一个与List<Object>
保持同步的Stack<PARSE_MODE>
:Objects
可能就是您的任何内容希望,让你能够回到&#34;进入上升的&#34; XML节点&#34;你正在处理的那个。但是,请勿使用Map
:Stack
可能会多次包含相同的PARSE_MODE
对象。这实际上说明了所有树状结构的基本特征: 没有单个节点 (此处:解析模式) 孤立存在:它的身份总是由通向它的整个路径定义 。
答案 8 :(得分:-1)
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class JXML {
private DocumentBuilder builder;
private Document doc = null;
private DocumentBuilderFactory factory ;
private XPathExpression expr = null;
private XPathFactory xFactory;
private XPath xpath;
private String xmlFile;
public static ArrayList<String> XMLVALUE ;
public JXML(String xmlFile){
this.xmlFile = xmlFile;
}
private void xmlFileSettings(){
try {
factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
xFactory = XPathFactory.newInstance();
xpath = xFactory.newXPath();
builder = factory.newDocumentBuilder();
doc = builder.parse(xmlFile);
}
catch (Exception e){
System.out.println(e);
}
}
public String[] selectQuery(String query){
xmlFileSettings();
ArrayList<String> records = new ArrayList<String>();
try {
expr = xpath.compile(query);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
records.add(nodes.item(i).getNodeValue());
}
return records.toArray(new String[records.size()]);
}
catch (Exception e) {
System.out.println("There is error in query string");
return records.toArray(new String[records.size()]);
}
}
public boolean updateQuery(String query,String value){
xmlFileSettings();
try{
NodeList nodes = (NodeList) xpath.evaluate(query, doc, XPathConstants.NODESET);
for (int idx = 0; idx < nodes.getLength(); idx++) {
nodes.item(idx).setTextContent(value);
}
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(doc), new StreamResult(new File(this.xmlFile)));
return true;
}catch(Exception e){
System.out.println(e);
return false;
}
}
public static void main(String args[]){
JXML jxml = new JXML("c://user.xml");
jxml.updateQuery("//Order/CustomerId/text()","222");
String result[]=jxml.selectQuery("//Order/Item/*/text()");
for(int i=0;i<result.length;i++){
System.out.println(result[i]);
}
}
}