Question

    URL = http://example.com,
Header = [],
Type = "application/json",
Content = "我是中文",

Body = lists:concat(["{\"type\":\"0\",\"result\":[{\"url\":\"test.cn\",\"content\":\"", unicode:characters_to_list(Content), "\"}]}"]),
lager:debug("URL:~p, Body:~p~n", [URL, Body]),
HTTPOptions = [],
Options = [],
Response = httpc:request(post, {URL, Header, Type, Body}, HTTPOptions, Options),

但http服务器收到的http请求正文不是我是中文。如何解决这个问题？

Answer 1

编码运气

您必须特别注意确保输入符合您的预期，因为它可能与您的预期不同。

这个答案适用于我正在运行的 R16B03-1 的Erlang版本。我将尝试在此处获取所有详细信息，以便您可以自行安装并验证。

如果您没有采取具体措施进行更改，则字符串将被解释如下：

在终端（OS X 10.9.2）

中

TerminalContent = "我是中文",
TerminalContent = [25105,26159,20013,25991].

在终端中，字符串被解释为unicode字符列表。

在模块中

BytewiseContent = "我是中文",
BytewiseContent = [230,136,145,230,152,175,228,184,173,230,150,135].

在模块中，默认编码为latin1，包含unicode字符的字符串被解释为bytewise个列表（UTF8字节）。

如果您使用BytewiseContent编码的数据，unicode:characters_to_list/1将对中文字符进行双重编码，ææ¯ä将发送到您预期我是中文的服务器。

解决方案

指定每个源文件和术语文件的编码。
如果您运行erl命令行，请确保将其设置为使用unicode。
如果您从文件中读取数据，请在处理之前将bytewise编码中的字节转换为unicode（这也适用于使用httpc:request/N获取的二进制数据）。

如果您在模块中嵌入了unicode字符，请确保通过在模块的前两行中进行注释来指示：

%% -*- coding: utf-8 -*-

这将改变模块解释字符串的方式：

UnicodeContent = "我是中文",
UnicodeContent = [25105,26159,20013,25991].

一旦确保连接字符而不是字节，连接就是安全的。请勿使用unicode:characters_to_list/1转换字符串/列表，直到整个内容都已构建完毕。

示例代码

给定Url和unicode字符Content列表时，以下函数按预期工作：

http_post_content(Url, Content) ->
    ContentType = "application/json",
    %% Concat the list of (character) lists
    Body = lists:concat(["{\"content\":\"", Content, "\"}"]),
    %% Explicitly encode to UTF8 before sending
    UnicodeBin = unicode:characters_to_binary(Body),
    httpc:request(post,
        {
            Url,
            [],          % HTTP headers
            ContentType, % content-type
            UnicodeBin   % the body as binary (UTF8)
            },
        [],            % HTTP Options
        [{body_format,binary}] % indicate the body is already binary
        ).

为验证结果，我使用node.js和express编写了以下HTTP服务器。此 死简单服务器 的唯一目的是为了检查问题和解决方案。

var express = require('express'),
bodyParser = require('body-parser'),
util = require('util');

var app = express();

app.use(bodyParser());

app.get('/', function(req, res){
  res.send('You probably want to perform an HTTP POST');
});

app.post('/', function(req, res){
  util.log("body: "+util.inspect(req.body, false, 99));
  res.json(req.body);
});

app.listen(3000);

Gist

验证

同样在Erlang中，以下函数将检查以确保HTTP响应包含回显的JSON，并确保返回确切的unicode字符。

verify_response({ok, {{_, 200, _}, _, Response}}, SentContent) ->
    %% use jiffy to decode the JSON response
    {Props} = jiffy:decode(Response),
    %% pull out the "content" property value
    ContentBin = proplists:get_value(<<"content">>, Props),
    %% convert the binary value to unicode characters,
    %% it should equal what we sent.
    case unicode:characters_to_list(ContentBin) of
        SentContent -> ok;
        Other ->
            {error, [
                {expected, SentContent},
                {received, Other}
                ]}
    end;
verify_response(Unexpected, _) ->
    {error, {http_request_failed, Unexpected}}.

完整的example.erl module is posted in a Gist。

一旦你编译了示例模块并运行了一个echo服务器，你就想在Erlang shell中运行这样的东西：

inets:start().

Url = example:url().

Content = example:content().

Response = example:http_post_content(Url, Content).

如果您已设置jiffy，您还可以验证往返的内容：

example:verify_response(Response, Content).

您现在应该能够确认任何unicode内容的往返编码。

编码之间的翻译

虽然我解释了上面的编码，但您会注意到TerminalContent，BytewiseContent和UnicodeContent都是整数列表。您应该努力以允许您确定手头所拥有的方式进行编码。

奇怪的编码是bytewise，当处理不是＆＃34; unicode感知＆＃34;的模块时，它可能会出现。 Erlang's guidance on working with unicode在标题 UTF-8字节列表标题下方提到了这一点。要翻译bytewise列表，请使用：

%% from http://www.erlang.org/doc/apps/stdlib/unicode_usage.html
utf8_list_to_string(StrangeList) ->
    unicode:characters_to_list(list_to_binary(StrangeList)).

我的设置

据我所知，我没有修改Erlang行为的本地设置。我的Erlang是 R16B03-1 由Erlang Solutions构建和分发，我的机器运行OS X 10.9.2。

如何在http请求体中支持中文？二郎神

1 个答案: