Question

我在JavaScript中处理utf-8字符串，需要将它们转义。

escape（）/ unescape（）和encodeURI（）/ decodeURI（）都可以在我的浏览器中使用。

逃逸（）

> var hello = "안녕하세요"
> var hello_escaped = escape(hello)
> hello_escaped
  "%uC548%uB155%uD558%uC138%uC694"
> var hello_unescaped = unescape(hello_escaped)
> hello_unescaped
  "안녕하세요"

是encodeURI（）

> var hello = "안녕하세요"    
> var hello_encoded = encodeURI(hello)
> hello_encoded
  "%EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94"
> var hello_decoded = decodeURI(hello_encoded)
> hello_decoded
  "안녕하세요"

然而，Mozilla says that escape() is deprecated。

虽然encodeURI（）和decodeURI（）使用上面的utf-8字符串，但docs（以及函数名称本身）告诉我这些方法适用于URI;我没有看到任何地方提到的utf-8字符串。

简单地说，对于utf-8字符串使用encodeURI（）和decodeURI（）是否可以？

Answer 1

你好！

说到escape和unescape，我遵守两条规则：

当你很容易就可以避免它们。
否则，请使用它们。

当你可以轻松避免它们时：

正如问题所述，escape和unescape都已被弃用。通常，应该避免使用已弃用的函数。

因此，如果encodeURIComponent或encodeURI为您解决问题，则应使用该代替escape。

在您无法轻易避免使用它们时使用它们：

浏览器将尽可能地努力实现向后兼容性。所有主流浏览器都已实施escape和unescape;他们为什么不解决这些问题？

如果新规范要求，浏览器必须重新定义escape和unescape。可是等等！编写规范的人非常聪明。他们也有兴趣不破坏向后兼容性！

我意识到上述论点很弱。但请相信我，...当涉及到浏览器时，不赞成的东西可行。这甚至包括弃用的HTML代码，例如<xmp>和<center>。

使用`escape`和`unescape`：

很自然地，下一个问题是，何时会使用escape或unescape？

最近，在处理CloudBrave时，我不得不处理utf8，latin1和转换间。

在阅读了大量博客文章之后，我意识到这很简单：

var utf8_to_latin1 = function (s) {
    return unescape(encodeURIComponent(s));
};
var latin1_to_utf8 = function (s) {
    return decodeURIComponent(escape(s));
};

这些相互转换，而不使用escape和unescape。通过不避免escape和unescape，生活变得更加简单。

希望这会有所帮助。

Answer 2

Mozilla说不推荐使用escape（）。

是的，您应该同时避免使用escape()和unescape()

简单地说，对于utf-8字符串使用encodeURI（）和decodeURI（）是否可以？

是的，但根据您的输入形式和所需的输出形式，您可能需要额外的工作。

根据您的问题，我假设您有一个JavaScript字符串，并且您希望将编码转换为UTF-8，最后将字符串存储为某种转义形式。

首先，重要的是要注意JavaScript字符串enconding是UCS-2，类似于UTF-16，与UTF-8不同。

请参阅：https://mathiasbynens.be/notes/javascript-encoding

encodeURIComponent()对于工作很有用，因为将UCS-2 JavaScript字符串转换为UTF-8并以%nn子串的形式将其转义，其中每个nn是两个每个字节的十六进制数字。

但是encodeURIComponent()不会转义ASCII范围内的字母，数字和其他几个字符。但这很容易解决。

例如，如果要将JavaScript字符串转换为表示原始字符串UTF-8编码字节的数字数组，则可以使用此函数：

//
// Convert JavaScript UCS2 string to array of bytes representing the string UTF8 encoded
//

function StringUTF8AsBytesArrayFromString( s )
{
    var i,
        n,
        u;

    u = [];
    s = encodeURIComponent( s );

    n = s.length;
    for( i = 0; i < n; i++ )
    {
        if( s.charAt( i ) == '%' )
        {
            u.push( parseInt( s.substring( i + 1, i + 3 ), 16 ) );
            i += 2;
        }
        else
        {
            u.push( s.charCodeAt( i ) );
        }
    }

    return u;
}

如果要以十六进制表示形式转换字符串：

//
// Convert JavaScript UCS2 string to hex string representing the bytes of the string UTF8 encoded
//

function StringUTF8AsHexFromString( s )
{
    var u,
        i,
        n,
        s;

    u = StringUTF8AsBytesArrayFromString( s );
    n = u.length;
    s = '';    

    for( i = 0; i < n; i++ )
    {
        s += ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );
    }

    return s;
}

如果将for循环中的行更改为

s += '%' + ( u[ i ] < 16 ? '0' : '' ) + u[ i ].toString( 16 );

（在每个十六进制数字前添加%符号）

生成的转义字符串（UTF-8编码）可能会被转换回带有decodeURIComponent()的JavaScript UCS-2字符串

Answer 3

使用 encodeURI() 或 encodeURIComponent() 永远是不行的。让我们试试看：

console.log(encodeURIComponent('@#*'));

输入：@#*。输出：%40%23*。等等，那么，* 字符究竟发生了什么？怎么没转化？想象一下：您询问用户要删除什么文件，他们的回答是 *。在服务器端，您使用 encodeURIComponent() 进行转换，然后运行 rm *。好吧，有消息要告诉你：using encodeURIComponent() means you just deleted all files.

在尝试对完整 URL（即所有 example.com?arg=val）进行编码时使用 fixedEncodeURI()，如 MDN encodeURI() Documentation...

<块引用>

function fixedEncodeURI(str) {
   return encodeURI(str).replace(/%5B/g, '[').replace(/%5D/g, ']');
}

或者，您可能需要使用 fixedEncodeURIComponent()，在尝试对 URL 的一部分进行编码时（即 arg 中的 val 或 example.com?arg=val），如在 MDN encodeURIComponent() Documentation...

中定义和进一步解释 <块引用>

function fixedEncodeURIComponent(str) {
 return encodeURIComponent(str).replace(/[!'()*]/g, function(c) {
   return '%' + c.charCodeAt(0).toString(16);
 });
}

如果根据上面的描述你无法区分它们，我总是喜欢将其简化为：

fixedEncodeURI() ：不会将 +@?=:#;,$& 编码为其 http 编码的等价物（如 & 和 +是常见的网址运算符）
fixedEncodeURIComponent() 将将 +@?=:#;,$& 编码为其 http 编码的等效项。

在JavaScript中使用encodeURI（）与escape（）for utf-8字符串

3 个答案:

当你可以轻松避免它们时：

在您无法轻易避免使用它们时使用它们：

使用`escape`和`unescape`：

在JavaScript中使用encodeURI（）与escape（）for utf-8字符串

3 个答案:

当你可以轻松避免它们时：

在您无法轻易避免使用它们时使用它们：

使用escape和unescape：

使用`escape`和`unescape`：