我需要unescape包含转义的XML标记的xml字符串:
<
>
&
etc...
我确实找到了一些可以执行此任务的库,但我宁愿使用一种可以执行此任务的方法。
有人可以帮忙吗?
欢呼声, Bas Hendriks
答案 0 :(得分:45)
StringEscapeUtils.unescapeXml(xml)
答案 1 :(得分:6)
这是一种简单的unescape XML方法。它处理预定义的XML实体和十进制数字实体(&amp; #nnnn;)。修改它以处理十六进制实体(&amp; #xhhhh;)应该很简单。
public static String unescapeXML( final String xml )
{
Pattern xmlEntityRegex = Pattern.compile( "&(#?)([^;]+);" );
//Unfortunately, Matcher requires a StringBuffer instead of a StringBuilder
StringBuffer unescapedOutput = new StringBuffer( xml.length() );
Matcher m = xmlEntityRegex.matcher( xml );
Map<String,String> builtinEntities = null;
String entity;
String hashmark;
String ent;
int code;
while ( m.find() ) {
ent = m.group(2);
hashmark = m.group(1);
if ( (hashmark != null) && (hashmark.length() > 0) ) {
code = Integer.parseInt( ent );
entity = Character.toString( (char) code );
} else {
//must be a non-numerical entity
if ( builtinEntities == null ) {
builtinEntities = buildBuiltinXMLEntityMap();
}
entity = builtinEntities.get( ent );
if ( entity == null ) {
//not a known entity - ignore it
entity = "&" + ent + ';';
}
}
m.appendReplacement( unescapedOutput, entity );
}
m.appendTail( unescapedOutput );
return unescapedOutput.toString();
}
private static Map<String,String> buildBuiltinXMLEntityMap()
{
Map<String,String> entities = new HashMap<String,String>(10);
entities.put( "lt", "<" );
entities.put( "gt", ">" );
entities.put( "amp", "&" );
entities.put( "apos", "'" );
entities.put( "quot", "\"" );
return entities;
}
答案 2 :(得分:4)
这是我在十分钟内写的一篇。它不使用正则表达式,只使用简单的迭代。我不认为这可以提高得更快。
public static String unescape(final String text) {
StringBuilder result = new StringBuilder(text.length());
int i = 0;
int n = text.length();
while (i < n) {
char charAt = text.charAt(i);
if (charAt != '&') {
result.append(charAt);
i++;
} else {
if (text.startsWith("&", i)) {
result.append('&');
i += 5;
} else if (text.startsWith("'", i)) {
result.append('\'');
i += 6;
} else if (text.startsWith(""", i)) {
result.append('"');
i += 6;
} else if (text.startsWith("<", i)) {
result.append('<');
i += 4;
} else if (text.startsWith(">", i)) {
result.append('>');
i += 4;
} else i++;
}
}
return result.toString();
}
答案 3 :(得分:0)
如果您使用JSP,请使用来自openutils-elfunctions
的su:unescapeXml