如何通过Validator获取有关无效DOM元素的更多信息?

时间:2011-11-10 09:33:41

标签: java xml xml-validation

我正在使用针对XSD架构的javax.xml.validation.Validator类来验证内存中的DOM对象。每当我填充DOM的信息中存在一些数据损坏时,我会在验证期间抛出SAXParseException

示例错误:

  

org.xml.SAXParseException:cvc-datatype-valid.1.2.1:'???“?? [????? G?> ??? p~tn ?? ~0?1]'不是'hexBinary'的有效值。

我希望有一种方法可以在我的内存DOM中找到此错误的位置,并打印出有问题的元素及其父元素。我目前的代码是:

public void writeDocumentToFile(Document document) throws XMLWriteException {
  try {
    // Validate the document against the schema
    Validator validator = getSchema(xmlSchema).newValidator();
    validator.validate(new DOMSource(document));

    // Serialisation logic here.

  } catch(SAXException e) {
    throw new XMLWriteException(e); // This is being thrown
  } // Some other exceptions caught here.
}

private Schema getSchema(URL schema) throws SAXException {
  SchemaFactory schemaFactory = 
    SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

  // Some logic here to specify a ResourceResolver

  return schemaFactory.newSchema(schema);
}

我查看了Validator#setErrorHandler(ErrorHandler handler)方法,但ErrorHandler界面只让我接触SAXParseException,只显示错误的行号和列号。因为我使用的是内存中的DOM,所以对于行号和列号都返回-1。

有更好的方法吗?我真的不想在将它们添加到DOM之前手动验证字符串,如果库为我提供了我正在寻找的功能。

我正在使用JDK 6 update 26和JDK 6 update 7,具体取决于此代码的运行位置。

编辑:添加此代码 -

validator.setErrorHandler(new ErrorHandler() {
  @Override
  public void warning(SAXParseException exception) throws SAXException {
    printException(exception);
    throw exception;
  }

  @Override
  public void error(SAXParseException exception) throws SAXException {
    printException(exception);
    throw exception;
  }

  @Override
  public void fatalError(SAXParseException exception) throws SAXException {
    printException(exception);
    throw exception;
  }

  private void printException(SAXParseException exception) {
    System.out.println("exception.getPublicId() = " + exception.getPublicId());
    System.out.println("exception.getSystemId() = " + exception.getSystemId());
    System.out.println("exception.getColumnNumber() = " + exception.getColumnNumber());
    System.out.println("exception.getLineNumber() = " + exception.getLineNumber());
  }
});

我得到了输出:

exception.getPublicId() = null
exception.getSystemId() = null
exception.getColumnNumber() = -1
exception.getLineNumber() = -1

2 个答案:

答案 0 :(得分:5)

如果您使用的是Xerces(默认为Sun JDK),则可以通过http://apache.org/xml/properties/dom/current-element-node属性获取验证失败的元素:

...
catch (SAXParseException e)
{
    Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");

    System.out.println("Validation error: " + e.getMessage());
    System.out.println("Element: " + curElement);
}   

示例:

String xml = "<root xmlns=\"http://www.myschema.org\">\n" +
             "<text>This is text</text>\n" +
             "<number>32</number>\n" +
             "<number>abc</number>\n" +
             "</root>";

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = dbf.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
Schema schema = getSchema(getClass().getResource("myschema.xsd"));

Validator validator = schema.newValidator();
try
{
    validator.validate(new DOMSource(doc));
}
catch (SAXParseException e)
{
    Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");

    System.out.println("Validation error: " + e.getMessage());
    System.out.println(curElement.getLocalName() + ": " + curElement.getTextContent());

    //Use curElement.getParentNode() or whatever you need here
}         

如果您需要从DOM获取行/列号,this answer可以解决该问题。

答案 1 :(得分:0)

SaxParseException公开SystemId和PublicId。这不能给你足够的信息吗?