从Java中的VBScript解码转义字符串

时间:2014-03-25 07:39:24

标签: java vbscript character-encoding escaping decoding

我尝试解码以下字符串,

String str  = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";

System.out.println(StringEscapeUtils.unescapeHtml(str));
try {
    System.out.println("res:"+java.net.URLDecoder.decode(str, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

两种方法都失败如下,

AT%26amp%3BT%20Network%20Client%20%u2013%20IBM
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u2"
    at java.net.URLDecoder.decode(URLDecoder.java:173)
    at decrypt.DecryptHtml.main(DecryptHtml.java:19)

字符串的来源是使用the Escape function的VBS脚本。我该如何解码这个字符串?

2 个答案:

答案 0 :(得分:3)

不幸的是,通过阅读文档,看来Microsoft已经完成了它(tm):"非标准xxx",其中这里" xxx"是"转义格式"。

具体而言,在the documentation of the VBScript function中,据说:

  

[...]使用%uxxxx格式存储值大于255的Unicode字符。

(嘿,MS:没有" Unicode字符&#34 ;;那些被称为代码点

大。所以你需要自己的解码功能。

幸运的是,我们使用Java。由于此专有转义序列仅涵盖基本多语言平面(U + 0000至U + FFFF),中的Unicode代码点,因为char是UTF-16代码单元,因为BMP和UTF-16之间存在1对1的映射,这使我们的工作变得更容易

以下是代码:

public final class MSUnescaper
{
    private static final char PERCENT = '%';
    private static final char NONSTANDARD_PCT_ESCAPE = 'u';

    private MSUnescaper()
    {
    }

    public static String unescape(final String input)
    {
        final StringBuilder sb = new StringBuilder(input.length());
        final CharBuffer buf = CharBuffer.wrap(input);

        char c;

        while (buf.hasRemaining()) {
            c = buf.get();
            if (c != PERCENT) {
                sb.append(c);
                continue;
            }
            if (!buf.hasRemaining())
                throw new IllegalArgumentException();
            c = buf.get();
            sb.append(c == NONSTANDARD_PCT_ESCAPE
                ? msEscape(buf) : standardEscape(buf, c));
        }

        return sb.toString();
    }

    private static char standardEscape(final CharBuffer buf, final char c)
    {
        if (!buf.hasRemaining())
            throw new IllegalArgumentException();
        final char[] array = { c, buf.get() };
        return (char) Integer.parseInt(new String(array), 16);
    }

    private static char msEscape(final CharBuffer buf)
    {
        if (buf.remaining() < 4)
            throw new IllegalArgumentException();
        final char[] array = new char[4];
        buf.get(array);
        return (char) Integer.parseInt(new String(array), 16);
    }

    public static void main(final String... args)
    {
        final String input = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";
        System.out.println(unescape(input));
    }
}

输出:

AT&amp;T Network Client – IBM

答案 1 :(得分:-1)

String str = "AT%26amp%3BT%20Network%20Client%20%[here]u[here]2013%20IBM" 我认为这个字符串无效。 %u20无效。 如果从字符串中删除u,则可以对其进行编码。  供参考:w3schools html url encodeing