Question

在node.js中实现HTTP服务时，有很多示例代码用于获取整个请求实体（客户端上传的数据，例如带有JSON数据的POST）：

var http = require('http');

var server = http.createServer(function(req, res) {
    var data = '';
    req.setEncoding('utf8');

    req.on('data', function(chunk) {
        data += chunk;
    });

    req.on('end', function() {
        // parse data
    });
});

使用req.setEncoding('utf8')自动将输入字节解码为字符串，假设输入是UTF8编码的。但我觉得它可以打破。如果我们收到一个以多字节UTF8字符结尾的数据块怎么办？我们可以模拟这个：

> new Buffer("café")
<Buffer 63 61 66 c3 a9>
> new Buffer("café").slice(0,4)
<Buffer 63 61 66 c3>
> new Buffer("café").slice(0,4).toString('utf8')
'caf?'

因此我们得到一个错误的字符，而不是等待下一个字节正确解码最后一个字符。

因此，除非请求对象处理这个问题，否则确保只有完全解码的字符被推入块中，这个无处不在的代码示例就会被破坏。

另一种方法是使用缓冲区，处理缓冲区大小限制的问题：

var http = require('http');
var MAX_REQUEST_BODY_SIZE = 16 * 1024 * 1024;

var server = http.createServer(function(req, res) {
    // A better way to do this could be to start with a small buffer
    // and grow it geometrically until the limit is reached.
    var requestBody = new Buffer(MAX_REQUEST_BODY_SIZE); 
    var requestBodyLength = 0;

    req.on('data', function(chunk) {
        if(requestBodyLength + chunk.length >= MAX_REQUEST_BODY_SIZE) {
           res.statusCode = 413; // Request Entity Too Large
           return;
        }
        chunk.copy(requestBody, requestBodyLength, 0, chunk.length);
        requestBodyLength += chunk.length;
    });

    req.on('end', function() {
        if(res.statusCode == 413) {
            // handle 413 error
            return;
        }

        requestBody = requestBody.toString('utf8', 0, requestBodyLength);
        // process requestBody as string
    });
});

我是对的，还是已经由http请求类处理了？

Answer 1

这是自动处理的。节点中有一个string_decoder模块，当您调用setEncoding时会加载该模块。解码器将检查收到的最后几个字节，并将它们存储在“数据”的发射之间。如果它们不是完整字符，那么数据将始终获得正确的字符串。如果你不进行setEncoding，并且不自己使用string_decoder，那么发出的缓冲区可能会出现你提到的问题。

虽然文档没有多大帮助http://nodejs.org/docs/latest/api/string_decoder.html，但您可以在此处查看模块，https://github.com/joyent/node/blob/master/lib/string_decoder.js

＆＃39; setEncoding＆＃39;的实施发射的逻辑也使它更清晰。

setEncoding：https://github.com/joyent/node/blob/master/lib/http.js#L270
_emitData https://github.com/joyent/node/blob/master/lib/http.js#L306

Answer 2

只需添加response.setEncoding（'utf8'）; request.on（'response'）回调函数。在我的情况下，这已经足够了。

Answer 3

// Post : 'tèéïst3 ùél'
// Node return : 't%C3%A8%C3%A9%C3%AFst3+%C3%B9%C3%A9l'
decodeURI('t%C3%A8%C3%A9%C3%AFst3+%C3%B9%C3%A9l');
// Return 'tèéïst3+ùél'

在请求体中解析UTF8字符的问题？

3 个答案: