字符串内容相同但equals方法返回false

时间:2013-04-25 06:12:16

标签: java stringescapeutils

我正在使用StringEscapeUtils来转义和转换html。我有以下代码

import org.apache.commons.lang.StringEscapeUtils;

public class EscapeUtils {

    public static void main(String args[]) {

        String string = "    4-Spaces    ,\"Double Quote\", 'Single Quote', \\Back-Slash\\, /Forward Slash/ ";

        String escaped = StringEscapeUtils.escapeHtml(string);
        String myEscaped = escapeHtml(string);

        String unescaped = StringEscapeUtils.unescapeHtml(escaped);
        String myUnescaped = StringEscapeUtils.unescapeHtml(myEscaped);

        System.out.println("Real String: " + string);
        System.out.println();
        System.out.println("Escaped String: " + escaped);
        System.out.println("My Escaped String: " + myEscaped);
        System.out.println();
        System.out.println("Unescaped String: " + unescaped);
        System.out.println("My Unescaped String: " + myUnescaped);
        System.out.println();
        System.out.println("Comparison:");
        System.out.println("Real String == Unescaped String: " + string.equals(unescaped));
        System.out.println("Real String == My Unescaped String: " + string.equals(myUnescaped));
        System.out.println("Unescaped String == My Unescaped String: " + unescaped.equals(myUnescaped));

    }

    public static String escapeHtml(String s) {
        String escaped = "";
        if(null != s) {
            escaped = StringEscapeUtils.escapeHtml(s);
            escaped = escaped.replaceAll(" "," ");
            escaped = escaped.replaceAll("'","'");
            escaped = escaped.replaceAll("\\\\","\");
            escaped = escaped.replaceAll("/","/");
        }
        return escaped;
    }

}

输出:

Real String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 

Escaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 
My Escaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 

Unescaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 
My Unescaped String:     4-Spaces    ,"Double Quote", 'Single Quote', \Back-Slash\, /Forward Slash/ 

Comparison:
Real String == Unescaped String: true
Real String == My Unescaped String: false
Unescaped String == My Unescaped String: false

escaped真实string,然后是unescaped它。但myEsceped首先使用相同的进程进行转义,然后使用其html代码替换更多的html字符。 myUnescaped实际上是myEscaped的unescape,其内容与真实字符串相同。

输出显示真实的stringunescapedmyUnescaped内容相同。但是,与比较部分一样,myUnescaped不等于stringunescaped

我还不明白这里到底发生了什么。谁能解释一下呢?

2 个答案:

答案 0 :(得分:3)

这是因为在转发HTML时,您正在用' '替换 

public static String escapeHtml(String s) {
        String escaped = "";
        if(null != s) {
            escaped = StringEscapeUtils.escapeHtml(s);
            escaped = escaped.replaceAll(" "," "); // HERE
            escaped = escaped.replaceAll("'","'");
            escaped = escaped.replaceAll("\\\\","\");
            escaped = escaped.replaceAll("/","/");
        }
        return escaped;
    }

虽然StringEscapeUtils.escapeHtml无法逃脱' ',但以下是其网站上的示例:

"bread" & "butter" 

变为

"bread" & "butter"

表示StringEscapeUtils.escapeHtml保留空格

如果从escapeHtml开始删除escaped = escaped.replaceAll(" "," ");,则unescapedmyUnescaped匹配!

答案 1 :(得分:1)

Apurv Answer之后,我分析了字符串的字节数组。

String:        32,  32,  32,  32,  52,  45,  83, 112,  97,  99, 101, 115,  32,  32,  32,  32,  44,  34,  68, 111, 117,  98, 108, 101,  32,  81, 117, 111, 116, 101,  34,  44,  32,  39,  83, 105, 110, 103, 108, 101,  32,  81, 117, 111, 116, 101,  39,  44,  32,  92,  66,  97,  99, 107,  45,  83, 108,  97, 115, 104,  92,  44,  32,  47,  70, 111, 114, 119,  97, 114, 100,  32,  83, 108,  97, 115, 104,  47,  32
unescaped :    32,  32,  32,  32,  52,  45,  83, 112,  97,  99, 101, 115,  32,  32,  32,  32,  44,  34,  68, 111, 117,  98, 108, 101,  32,  81, 117, 111, 116, 101,  34,  44,  32,  39,  83, 105, 110, 103, 108, 101,  32,  81, 117, 111, 116, 101,  39,  44,  32,  92,  66,  97,  99, 107,  45,  83, 108,  97, 115, 104,  92,  44,  32,  47,  70, 111, 114, 119,  97, 114, 100,  32,  83, 108,  97, 115, 104,  47,  32
myUnescaped:  -96, -96, -96, -96,  52,  45,  83, 112,  97,  99, 101, 115, -96, -96, -96, -96,  44,  34,  68, 111, 117,  98, 108, 101, -96,  81, 117, 111, 116, 101,  34,  44, -96,  39,  83, 105, 110, 103, 108, 101, -96,  81, 117, 111, 116, 101,  39,  44, -96,  92,  66,  97,  99, 107,  45,  83, 108,  97, 115, 104,  92,  44, -96,  47,  70, 111, 114, 119,  97, 114, 100, -96,  83, 108,  97, 115, 104,  47, -96

我似乎在myUnescaped,空格已转换为ascii -96而不是32

所以我写了一个unescapeHtml方法,如下所示。此方法首先用空格替换&nbsp,然后使用StringEscapeUtils转换为unescape html。

public static String unescapeHtml(String s) {
    String unescaped = "";
    if(null != s) {
        unescaped = s.replaceAll(" ", " ");
        unescaped = StringEscapeUtils.unescapeHtml(unescaped);
    }
    return unescaped;
}

然后我使用以下代码获得myUnescaped

String myUnescaped = unescapeHtml(myEscaped);

这使我myUnescaped字符串等于stringunescaped

替代我将 替换为 。这不需要我写unescapeHtml mehod。更新后的escapeHtml方法代码如下所示。

public static String escapeHtml(String s) {
    String escaped = "";
    if(null != s) {
        escaped = StringEscapeUtils.escapeHtml(s);
        escaped = escaped.replaceAll(" "," ");    //updated line 
        escaped = escaped.replaceAll("'","'");
        escaped = escaped.replaceAll("\\\\","\");
        escaped = escaped.replaceAll("/","/");
    }
    return escaped;
}