跳过/解析特殊标签

时间:2015-09-13 06:32:51

标签: jsoup

我正在编写一个eclipse插件工具,它将修复html属性问题。 UI代码使用spring和其他一些框架。当我解析并写入文件时,这些标记没有正确写入。我也有<script>标签;这没有达到预期的效果。我希望“&lt;#”,“&lt; @”不受影响或正确写入。请帮忙。

输入代码 ----------

<#macro contentcol>
  <p data-taganalytics="trackSection" data-taglocation="AddABankAccount">
        <a href="#" id="faq" class="btnSmall jq-modal" role="button" rel="<@spring.message "linkaccount.addaccount.faq.questionid" />">
                <@spring.message "linkaccount.addaccount.faq.text" />
        </a>
   </p>

  <#if (spring.status?? && spring.status.errorMessages?exists && spring.status.errorMessages?is_sequence && spring.status.errorMessages?size > 0 ) >
        <@tom.message style="error">
            <p>
                <strong>
                    <#list spring.status.errorMessages as error>
                            <li>${error}</li>
                    </#list>
                </strong>
            </p>
        </@tom.message>
  </#if>             

输出代码 ----------

<html><head></head><body>&lt;#macro contentcol&gt;
  <p data-taganalytics="trackSection" data-taglocation="AddABankAccount">
        <a href="#" id="faq" class="btnSmall jq-modal" role="button" rel="&lt;@spring.message " linkaccount.addaccount.faq.questionid"=""></a>&quot;&gt;
                &lt;@spring.message &quot;linkaccount.addaccount.faq.text&quot; /&gt;

   </p>

  &lt;#if (spring.status?? &amp;&amp; spring.status.errorMessages?exists &amp;&amp; spring.status.errorMessages?is_sequence &amp;&amp; spring.status.errorMessages?size &gt; 0 ) &gt;
        &lt;@tom.message style=&quot;error&quot;&gt;
            <p>
                <strong>
                    &lt;#list spring.status.errorMessages as error&gt;
                            </strong></p><li><strong>${error}</strong></li><strong>
                    <!--#list-->
                </strong>
            <p></p>
        <!--@tom.message-->
  <!--#if-->             

这是我的解析器读取代码:

htmlFile = DocumentUtil.fixCompliance(Jsoup.parse(in, "ISO-8859-1"));

这是我的编写代码:

Document.OutputSettings settings = document.outputSettings();

settings.prettyPrint(false);

settings.escapeMode(Entities.EscapeMode.base);

settings.charset("ASCII")   

System.out.println(document.html().toString());

writer = new PrintWriter(in, "ASCII"); 

writer.write(document.html());

writer.flush();

writer.close();

尝试了UTF-8和ASCII

1 个答案:

答案 0 :(得分:0)

我必须修改Jsoup源代码,以满足我的要求。以下是完成的更改,请检查行注释。

<强> TokeniserState.java

    EndTagOpen {
    void read(Tokeniser t, CharacterReader r) {
        if (r.isEmpty()) {
            t.eofError(this);
            t.emit("</");
            t.transition(Data);
        } else if (r.matchesLetter()) {
            t.createTagPending(false);
            t.transition(TagName);
        } else if (r.matches('>')) {
            t.error(this);
            t.advanceTransition(Data);
        } 
        else if (r.matches('#') || r.matches('@')) { // Added this condition
            t.error(this);
            t.emit("</");
            t.transition(Data);
        } 
        else {
            t.error(this);
            t.advanceTransition(BogusComment);
        }
    }

Entities.java

if (codePoint < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            final char c = (char) codePoint;
            switch (c) {
                case '&':
                    accum.append("&amp;");
                    break;
                case 0xA0:
                    if (escapeMode != EscapeMode.xhtml)
                        accum.append("&nbsp;");
                    else
                        accum.append("&#xa0;");
                    break;
                case '<':
                    if (!inAttribute || escapeMode == EscapeMode.xhtml)
                        accum.append("<"); //Modified this line
                    else
                        accum.append(c);
                    break;
                case '>':
                    if (!inAttribute)
                        accum.append(">"); ///Modified this line
                    else
                        accum.append(c);
                    break;