JDOM SAXBuilder - 假设CDATA用于特定标记

时间:2013-03-01 22:56:58

标签: java xml cdata jdom

我正在使用org.jdom.input.SAXBuilder将输入流(XML文件)解析为JDOM文档。

SAXBuilder builder = new SAXBuilder(JavaScriptParser.class.getName());
org.jdom.Document document = builder.build(stream);

我的XML流中有特定的标记,其中标记内的内容必须被视为CDATA。

示例:

<text><![CDATA[ Any text... ]]></text>
<javascript><![CDATA[ function doSomething(){} ]]></javascript>

目前上面的例子将进行解析。我试图做了两天的尝试是扩展org.apache.xerces.impl.XMLScanner,所以下面的例子将解析上面的相同内容。

示例:

<text>Any text...</text>
<javascript>function doSomething(){}</javascript>

以下是我目前的情况。从父实现中复制scanCDATASection(),仅更改fEntityScanner.scanData("</javascript>", fStringBuffer)行以查找我的结束标记而不是双结束括号。我很难确定问题是什么,但我认为EntityScanner.scanData("</javascript>")无法正常工作,因为我从未进入“if”语句。或者我在scanCDATASection()内注入scanContent()的电话会导致问题。我不得不想象有一种更清洁,更直接的方法来完成这项任务。我没有定制XML Parser的经验,在我们的应用程序中每次使用SAXParser都使用默认设置。任何建议/提示将不胜感激。

public class JavaScriptScanner extends org.apache.xerces.impl.XMLNSDocumentScannerImpl
{
    private final XMLStringBuffer fStringBuffer = new XMLStringBuffer();

    public JavaScriptScanner()
    {
        super();
    }

    @Override
    protected int scanContent() throws IOException, XNIException 
    {
        if ("javascript".equals(fCurrentElement.rawname))
        {
            scanCDATASection();
            setScannerState(SCANNER_STATE_CONTENT);
        }

        return super.scanContent();
    }

    protected boolean scanCDATASection() throws IOException, XNIException 
    {
        // call handler
        if (fDocumentHandler != null) {
            fDocumentHandler.startCDATA(null);
        }

        while (true) {
            fStringBuffer.clear();
            if (!fEntityScanner.scanData("</javascript>", fStringBuffer) || 
                    fStringBuffer.toString().contains("</javascript")) {
                if (fDocumentHandler != null && fStringBuffer.length > 0) {
                    fDocumentHandler.characters(fStringBuffer, null);
                }
                int brackets = 0;
                while (fEntityScanner.skipChar(']')) {
                    brackets++;
                }
                if (fDocumentHandler != null && brackets > 0) {
                    fStringBuffer.clear();
                    if (brackets > XMLEntityManager.DEFAULT_BUFFER_SIZE) {
                        // Handle large sequences of ']'
                        int chunks = brackets / XMLEntityManager.DEFAULT_BUFFER_SIZE;
                        int remainder = brackets % XMLEntityManager.DEFAULT_BUFFER_SIZE;
                        for (int i = 0; i < XMLEntityManager.DEFAULT_BUFFER_SIZE; i++) {
                            fStringBuffer.append(']');
                        }
                        for (int i = 0; i < chunks; i++) {
                            fDocumentHandler.characters(fStringBuffer, null);
                        }
                        if (remainder != 0) {
                            fStringBuffer.length = remainder;
                            fDocumentHandler.characters(fStringBuffer, null);
                        }
                    }
                    else {
                        for (int i = 0; i < brackets; i++) {
                            fStringBuffer.append(']');
                        }
                       fDocumentHandler.characters(fStringBuffer, null);
                    }
                }
                if (fEntityScanner.skipChar('>')) {
                    break;
                }
                if (fDocumentHandler != null) {
                    fStringBuffer.clear();
                    fStringBuffer.append("]]");
                    fDocumentHandler.characters(fStringBuffer, null);
                }
            }
            else {
                if (fDocumentHandler != null) {
                    fDocumentHandler.characters(fStringBuffer, null);
                }
                int c = fEntityScanner.peekChar();
                if (c != -1 && isInvalidLiteral(c)) {
                    if (XMLChar.isHighSurrogate(c)) {
                        fStringBuffer.clear();
                        scanSurrogates(fStringBuffer);
                        if (fDocumentHandler != null) {
                            fDocumentHandler.characters(fStringBuffer, null);
                        }
                    }
                    else {
                        reportFatalError("InvalidCharInCDSect",
                                        new Object[]{Integer.toString(c,16)});
                        fEntityScanner.scanChar();
                    }
                }
            }
        }
        fMarkupDepth--;

        // call handler
        if (fDocumentHandler != null) {
            fDocumentHandler.endCDATA(null);
        }

        return true;
    }
}

0 个答案:

没有答案