Question

我在字符串中有以下XML代码段（请注意双重编码＆amp;）：

...
&lt;PARA&gt;
S&amp;amp;P
&lt;/PARA&gt;
...

我想要的输出是：

> ... <PARA> S&amp;P </PARA> ...

如果我使用：

StringEscapeUtils.unescapeXml（）

实际的输出是：

 > ... <PARA> S&P </PARA> ...

似乎StringEscapeUtils.unescapeXml（）将输入转义两次，或者只要它包含实体。

是否有一个更好的实用方法，或简单的解决方案，可以一次性覆盖每个xml实体（不只是几个但是所有突出的字符），以便我的编码＆amp;部分没有搞砸了？

感谢，彼得

Answer 1

使用第三方库时，应包括库名称和版本。

StringEscapeUtils是Apache Commons Text和Apache Commons Lang的一部分（已弃用）。最新版本（截至2017年11月）是Commons Text 1.1和Commons Lang 3.3.7。两个版本都显示正确的结果。

import org.apache.commons.text.StringEscapeUtils;
public class EscapeTest {
  public static void main(String[] args) {
    final String s = "&lt;PARA&gt; S&amp;amp;P &lt;/PARA&gt;";
    System.out.println(StringEscapeUtils.unescapeXml(s));
  }
}

输出：<PARA> S&P </PARA>

Answer 2

也许这是一个漫长的方式，但我无法使用Apache Commons

public static void main(String[] args) {
        String a = "&lt;PARA&gt; S&amp;amp;P &lt;/PARA&gt;";
        String ea = unescapeXML(a);
        System.out.println(ea);
    }

    public static String unescapeXML(final String xml) {
        Pattern xmlEntityRegex = Pattern.compile("&(#?)([^;]+);");
        StringBuffer unescapedOutput = new StringBuffer(xml.length());

        Matcher m = xmlEntityRegex.matcher(xml);
        Map<String, String> builtinEntities = null;
        String entity;
        String hashmark;
        String ent;
        int code;
        while (m.find()) {
            ent = m.group(2);
            hashmark = m.group(1);
            if ((hashmark != null) && (hashmark.length() > 0)) {
                code = Integer.parseInt(ent);
                entity = Character.toString((char) code);
            } else {
                if (builtinEntities == null) {
                    builtinEntities = buildBuiltinXMLEntityMap();
                }
                entity = builtinEntities.get(ent);
                if (entity == null) {
                    entity = "&" + ent + ';';
                }
            }
            m.appendReplacement(unescapedOutput, entity);
        }
        m.appendTail(unescapedOutput);
        return unescapedOutput.toString();

    }

    private static Map<String, String> buildBuiltinXMLEntityMap() {
        Map<String, String> entities = new HashMap<>(10);
        entities.put("lt", "<");
        entities.put("gt", ">");
        entities.put("amp", "&");
        entities.put("apos", "'");
        entities.put("quot", "\"");
        return entities;
    }

<强>输出：

<PARA> S&amp;P </PARA>

仅转义XML实体一次

2 个答案: