我不知道为什么这不起作用。我正在尝试解析一些xml文件,然后再引入.dtd文件。不幸的是,这不起作用,因为它抛出org.xml.sax.SAXParseException
。
try {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
dBuilder.setEntityResolver(new EntityResolver() {
@Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("meter.dtd") == false) {
return null;
}
String path = null;
try {
File dtd = Resource.getFileFromResource("meter_corpus/sgml_dtds/meter.dtd");
path = dtd.getAbsolutePath();
} catch (Exception e) {
e.printStackTrace();
}
if(path == null) {
return null;
}
return new InputSource(new FileReader(path));
}
});
xmlDocument = dBuilder.parse(xmlFile);
xmlDocument.getDocumentElement().normalize();
}catch (Exception e) {
e.printStackTrace();
}
meter.dtd
文件:
<!ELEMENT meterdocument - - (title?,body)>
<!ATTLIST meterdocument
classification CDATA #IMPLIED
pagenumber NUMBER #IMPLIED
filename CDATA #REQUIRED
newspaper CDATA #REQUIRED
domain CDATA #REQUIRED
date CDATA #REQUIRED
catchline CDATA #REQUIRED >
<!ELEMENT title - - (#PCDATA)>
<!ELEMENT body - - (((verbatim | rewrite | new)+) | unclassified)>
<!ELEMENT verbatim - - (#PCDATA)>
<!ATTLIST verbatim PAsource CDATA #IMPLIED>
<!ELEMENT rewrite - - (#PCDATA)>
<!ATTLIST rewrite PAsource CDATA #IMPLIED>
<!ELEMENT new - - (#PCDATA)>
<!ATTLIST new PAsource CDATA #IMPLIED>
<!ELEMENT unclassified - - (#PCDATA)>
应该是可解析的文件:
<!DOCTYPE meterdocument SYSTEM "meter.dtd" [
]>
<meterdocument filename="/meter_corpus/newspapers/annotated/courts/01.03.00/football/football382_star.sgml" newspaper="star" domain="courts" classification="wholly-derived" pagenumber="12" date="01.03.00" catchline="football">
<body>
<Verbatim PAsource="" >SIX football fans will</Verbatim>
<Rewrite PAsource="" > find out </Rewrite>
<Rewrite PAsource="" >today </Rewrite>
<Rewrite PAsource="" >whether they have won their fight to stop </Rewrite>
<Verbatim PAsource="" >Newcastle United </Verbatim>
<Rewrite PAsource="" >moving </Rewrite>
<Verbatim PAsource="" >their seats. </Verbatim>
<Verbatim PAsource="" >Mr Justice Blackburne, sitting at Newcastle High Court, </Verbatim>
<Rewrite PAsource="" >will reveal </Rewrite>
<Verbatim PAsource="" >his </Verbatim>
<Rewrite PAsource="" >decision over the </Rewrite>
<Verbatim PAsource="" >season ticket holders' </Verbatim>
<Rewrite PAsource="" >battle </Rewrite>
<Verbatim PAsource="" >at noon.
</Verbatim>
</body>
</meterdocument>
和包含错误行的完整堆栈跟踪:
[Fatal Error] :32:43: The attribute type is required in the declaration of attribute "pagenumber" for element "meterdocument".
org.xml.sax.SAXParseException; lineNumber: 32; columnNumber: 43; The attribute type is required in the declaration of attribute "pagenumber" for element "meterdocument".
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
at eval.meter.METERDocument.<init>(METERDocument.java:68)
at eval.meter.METERCorpus.runMeterCorpusTest(METERCorpus.java:194)
at eval.meter.METERCorpus.main(METERCorpus.java:92)
java.lang.NullPointerException
at eval.meter.METERDocument.<init>(METERDocument.java:74)
at eval.meter.METERCorpus.runMeterCorpusTest(METERCorpus.java:194)
at eval.meter.METERCorpus.main(METERCorpus.java:92)
我必须做什么才能正确解析此文件?
答案 0 :(得分:0)
尝试NMTOKEN
(名称标记)或其他内容,而不是NUMBER
。 #IMPLIED
可能值得一个值。