我正在尝试使用<cfxml>
创建一个xml对象。我用XMLFormat()
格式化了所有数据。
在XML中有一些无效字符,如'»'。我将这些字符添加到xml doctype中,如下所示:
<!ENTITY raquo "»">
HTML文本格式不是很好,但大多数都适用于我的代码。但在某些文本中有一些控制字符。我收到以下错误:
在文档的元素内容中找到了无效的XML字符(Unicode:0x13)。
我尝试将unicode添加到doctype中,然后尝试了solution。两者都不起作用......
答案 0 :(得分:2)
这是有效的cfscript代码,用于清理我们的XML,有两种方法,一种清除较高的国际字符,另一种只清除破坏XML的较低的ASCII字符,如果找到更多的字符,只需展开过滤规则。
<cfscript>
function cleanHighAscii(text){
var buffer = createObject("java", "java.lang.StringBuffer").init();
var pattern = createObject("java", "java.util.regex.Pattern").compile(javaCast( "string", "[^\x00-\x7F]" ));
var matcher = pattern.Matcher(javaCast( "string", text));
while(matcher.find()){
var value = matcher.group();
var asciiValue = asc(value);
if ((asciiValue == 8220) OR (asciiValue == 8221))
value = """";
else if ((asciiValue == 8216) || (asciiValue == 8217))
value = "'";
else if (asciiValue == 8230)
value = "...";
else
value = "&###asciiValue#;";
matcher.AppendReplacement(buffer, javaCast( "string", value ));
}
matcher.AppendTail(buffer);
return buffer.ToString();
}
function removeSubAscii(text){
return rereplaceNoCase(text, "\x1A","&###26#;", "all");
}
function XMLSafe(text){
text = cleanHighAscii(text);
text = removeSubAscii(text);
return text;
}
</cfscript>
其他posisbilty是用户CF10函数encodeForXML():
https://learn.adobe.com/wiki/display/coldfusionen/EncodeForXML
或直接使用CF10附带的ESAPI或从OWASP网站https://www.owasp.org/index.php/ESAPI_Overview向您的旧CF添加ESAPI jar:
var esapi = createObject("java", "org.owasp.esapi.ESAPI");
var esapiEncoder = esapi.encoder();
return esapiEncoder.encodeForXML(text);
答案 1 :(得分:0)
尝试使用»
代替»
。例如,这个CFML:
<cfxml variable="x"><?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE doc
[
<!ENTITY raquo "»">
]>
<doc>
Hello, » !
</doc>
</cfxml>
<cfdump var="#x#">
答案 2 :(得分:-1)
将XML字符串传递给此方法,这将解决您的问题。
它只允许在输入中发送有效字符,如果你想用其他字符替换invalids,你可以修改下面的方法来做到这一点
public String stripNonValidXMLCharacters(String in) {
StringBuffer out = new StringBuffer(); // Used to hold the output.
char current; // Used to reference the current character.
if (in == null || ("".equals(in))) return ""; // vacancy test.
for (int i = 0; i < in.length(); i++) {
current = in.charAt(i);
if ((current == 0x9) ||
(current == 0xA) ||
(current == 0xD) ||
((current >= 0x20) && (current <= 0xD7FF)) ||
((current >= 0xE000) && (current <= 0xFFFD)) ||
((current >= 0x10000) && (current <= 0x10FFFF)))
out.append(current);
}
return out.toString();
}