我正在使用StringEscapeUtils来转义和转换html。我有以下代码
import org.apache.commons.lang.StringEscapeUtils;
public class EscapeUtils {
public static void main(String args[]) {
String string = " 4-Spaces ,\"Double Quote\", 'Single Quote', \\Back-Slash\\, /Forward Slash/ ";
String escaped = StringEscapeUtils.escapeHtml(string);
String myEscaped = escapeHtml(string);
String unescaped = StringEscapeUtils.unescapeHtml(escaped);
String myUnescaped = StringEscapeUtils.unescapeHtml(myEscaped);
System.out.println("Real String: " + string);
System.out.println();
System.out.println("Escaped String: " + escaped);
System.out.println("My Escaped String: " + myEscaped);
System.out.println();
System.out.println("Unescaped String: " + unescaped);
System.out.println("My Unescaped String: " + myUnescaped);
System.out.println();
System.out.println("Comparison:");
System.out.println("Real String == Unescaped String: " + string.equals(unescaped));
System.out.println("Real String == My Unescaped String: " + string.equals(myUnescaped));
System.out.println("Unescaped String == My Unescaped String: " + unescaped.equals(myUnescaped));
}
public static String escapeHtml(String s) {
String escaped = "";
if(null != s) {
escaped = StringEscapeUtils.escapeHtml(s);
escaped = escaped.replaceAll(" "," ");
escaped = escaped.replaceAll("'","'");
escaped = escaped.replaceAll("\\\\","\");
escaped = escaped.replaceAll("/","/");
}
return escaped;
}
}
输出:
Real String: 4-Spaces ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/
Escaped String: 4-Spaces ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/
My Escaped String: 4-Spaces ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/
Unescaped String: 4-Spaces ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/
My Unescaped String: 4-Spaces ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/
Comparison:
Real String == Unescaped String: true
Real String == My Unescaped String: false
Unescaped String == My Unescaped String: false
我escaped
真实string
,然后是unescaped
它。但myEsceped
首先使用相同的进程进行转义,然后使用其html代码替换更多的html字符。 myUnescaped
实际上是myEscaped
的unescape,其内容与真实字符串相同。
输出显示真实的string
,unescaped
和myUnescaped
内容相同。但是,与比较部分一样,myUnescaped
不等于string
和unescaped
。
我还不明白这里到底发生了什么。谁能解释一下呢?
答案 0 :(得分:3)
这是因为在转发HTML时,您正在用' '
替换
public static String escapeHtml(String s) {
String escaped = "";
if(null != s) {
escaped = StringEscapeUtils.escapeHtml(s);
escaped = escaped.replaceAll(" "," "); // HERE
escaped = escaped.replaceAll("'","'");
escaped = escaped.replaceAll("\\\\","\");
escaped = escaped.replaceAll("/","/");
}
return escaped;
}
虽然StringEscapeUtils.escapeHtml
无法逃脱' '
,但以下是其网站上的示例:
"bread" & "butter"
变为
"bread" & "butter"
表示StringEscapeUtils.escapeHtml
保留空格
如果从escapeHtml
开始删除escaped = escaped.replaceAll(" "," ");
,则unescaped
和myUnescaped
匹配!
答案 1 :(得分:1)
在Apurv Answer之后,我分析了字符串的字节数组。
String: 32, 32, 32, 32, 52, 45, 83, 112, 97, 99, 101, 115, 32, 32, 32, 32, 44, 34, 68, 111, 117, 98, 108, 101, 32, 81, 117, 111, 116, 101, 34, 44, 32, 39, 83, 105, 110, 103, 108, 101, 32, 81, 117, 111, 116, 101, 39, 44, 32, 92, 66, 97, 99, 107, 45, 83, 108, 97, 115, 104, 92, 44, 32, 47, 70, 111, 114, 119, 97, 114, 100, 32, 83, 108, 97, 115, 104, 47, 32
unescaped : 32, 32, 32, 32, 52, 45, 83, 112, 97, 99, 101, 115, 32, 32, 32, 32, 44, 34, 68, 111, 117, 98, 108, 101, 32, 81, 117, 111, 116, 101, 34, 44, 32, 39, 83, 105, 110, 103, 108, 101, 32, 81, 117, 111, 116, 101, 39, 44, 32, 92, 66, 97, 99, 107, 45, 83, 108, 97, 115, 104, 92, 44, 32, 47, 70, 111, 114, 119, 97, 114, 100, 32, 83, 108, 97, 115, 104, 47, 32
myUnescaped: -96, -96, -96, -96, 52, 45, 83, 112, 97, 99, 101, 115, -96, -96, -96, -96, 44, 34, 68, 111, 117, 98, 108, 101, -96, 81, 117, 111, 116, 101, 34, 44, -96, 39, 83, 105, 110, 103, 108, 101, -96, 81, 117, 111, 116, 101, 39, 44, -96, 92, 66, 97, 99, 107, 45, 83, 108, 97, 115, 104, 92, 44, -96, 47, 70, 111, 114, 119, 97, 114, 100, -96, 83, 108, 97, 115, 104, 47, -96
我似乎在myUnescaped
,空格已转换为ascii -96
而不是32
。
所以我写了一个unescapeHtml
方法,如下所示。此方法首先用空格替换 
,然后使用StringEscapeUtils
转换为unescape html。
public static String unescapeHtml(String s) {
String unescaped = "";
if(null != s) {
unescaped = s.replaceAll(" ", " ");
unescaped = StringEscapeUtils.unescapeHtml(unescaped);
}
return unescaped;
}
然后我使用以下代码获得myUnescaped
。
String myUnescaped = unescapeHtml(myEscaped);
这使我myUnescaped
字符串等于string
和unescaped
。
替代我将
替换为 
。这不需要我写unescapeHtml
mehod。更新后的escapeHtml
方法代码如下所示。
public static String escapeHtml(String s) {
String escaped = "";
if(null != s) {
escaped = StringEscapeUtils.escapeHtml(s);
escaped = escaped.replaceAll(" "," "); //updated line
escaped = escaped.replaceAll("'","'");
escaped = escaped.replaceAll("\\\\","\");
escaped = escaped.replaceAll("/","/");
}
return escaped;
}