我正在使用org.jdom.input.SAXBuilder将输入流(XML文件)解析为JDOM文档。
SAXBuilder builder = new SAXBuilder(JavaScriptParser.class.getName());
org.jdom.Document document = builder.build(stream);
我的XML流中有特定的标记,其中标记内的内容必须被视为CDATA。
示例:
<text><![CDATA[ Any text... ]]></text>
<javascript><![CDATA[ function doSomething(){} ]]></javascript>
目前上面的例子将进行解析。我试图做了两天的尝试是扩展org.apache.xerces.impl.XMLScanner,所以下面的例子将解析上面的相同内容。
示例:
<text>Any text...</text>
<javascript>function doSomething(){}</javascript>
以下是我目前的情况。从父实现中复制scanCDATASection()
,仅更改fEntityScanner.scanData("</javascript>", fStringBuffer)
行以查找我的结束标记而不是双结束括号。我很难确定问题是什么,但我认为EntityScanner.scanData("</javascript>")
无法正常工作,因为我从未进入“if”语句。或者我在scanCDATASection()
内注入scanContent()
的电话会导致问题。我不得不想象有一种更清洁,更直接的方法来完成这项任务。我没有定制XML Parser的经验,在我们的应用程序中每次使用SAXParser都使用默认设置。任何建议/提示将不胜感激。
public class JavaScriptScanner extends org.apache.xerces.impl.XMLNSDocumentScannerImpl
{
private final XMLStringBuffer fStringBuffer = new XMLStringBuffer();
public JavaScriptScanner()
{
super();
}
@Override
protected int scanContent() throws IOException, XNIException
{
if ("javascript".equals(fCurrentElement.rawname))
{
scanCDATASection();
setScannerState(SCANNER_STATE_CONTENT);
}
return super.scanContent();
}
protected boolean scanCDATASection() throws IOException, XNIException
{
// call handler
if (fDocumentHandler != null) {
fDocumentHandler.startCDATA(null);
}
while (true) {
fStringBuffer.clear();
if (!fEntityScanner.scanData("</javascript>", fStringBuffer) ||
fStringBuffer.toString().contains("</javascript")) {
if (fDocumentHandler != null && fStringBuffer.length > 0) {
fDocumentHandler.characters(fStringBuffer, null);
}
int brackets = 0;
while (fEntityScanner.skipChar(']')) {
brackets++;
}
if (fDocumentHandler != null && brackets > 0) {
fStringBuffer.clear();
if (brackets > XMLEntityManager.DEFAULT_BUFFER_SIZE) {
// Handle large sequences of ']'
int chunks = brackets / XMLEntityManager.DEFAULT_BUFFER_SIZE;
int remainder = brackets % XMLEntityManager.DEFAULT_BUFFER_SIZE;
for (int i = 0; i < XMLEntityManager.DEFAULT_BUFFER_SIZE; i++) {
fStringBuffer.append(']');
}
for (int i = 0; i < chunks; i++) {
fDocumentHandler.characters(fStringBuffer, null);
}
if (remainder != 0) {
fStringBuffer.length = remainder;
fDocumentHandler.characters(fStringBuffer, null);
}
}
else {
for (int i = 0; i < brackets; i++) {
fStringBuffer.append(']');
}
fDocumentHandler.characters(fStringBuffer, null);
}
}
if (fEntityScanner.skipChar('>')) {
break;
}
if (fDocumentHandler != null) {
fStringBuffer.clear();
fStringBuffer.append("]]");
fDocumentHandler.characters(fStringBuffer, null);
}
}
else {
if (fDocumentHandler != null) {
fDocumentHandler.characters(fStringBuffer, null);
}
int c = fEntityScanner.peekChar();
if (c != -1 && isInvalidLiteral(c)) {
if (XMLChar.isHighSurrogate(c)) {
fStringBuffer.clear();
scanSurrogates(fStringBuffer);
if (fDocumentHandler != null) {
fDocumentHandler.characters(fStringBuffer, null);
}
}
else {
reportFatalError("InvalidCharInCDSect",
new Object[]{Integer.toString(c,16)});
fEntityScanner.scanChar();
}
}
}
}
fMarkupDepth--;
// call handler
if (fDocumentHandler != null) {
fDocumentHandler.endCDATA(null);
}
return true;
}
}