邮件请求正文的Urlencoding数据。我使用错误的字符集吗?

时间:2011-10-28 13:43:17

标签: java utf-8 character-encoding http-post iso-8859-1

我想用Java复制一个有效的POST请求。出于测试目的,让我们采取如下信息:'äöõüäöõüäöõüäöõü'

工作POST请求(带有'äöõüäöõüäöõüäöõü'的编码信息):

标题

POST http://www.mysite.com/newreply.php?do=postreply&t=477352 HTTP/1.1
Host: www.warriorforum.com
Connection: keep-alive
Content-Length: 403
Origin: http://www.mysite.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko)Chrome/14.0.835.202 Safari/535.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: */*
Referer: http://www.mysite.com/test-forum/477352-test.html
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: bblastvisit=1319205053; bblastactivity=0; bbuserid=265374; bbpassword=1125e9ec1ab41f532ab8ec6f77ddaf94; bbsessionhash=91444317c100996990a04d6c5bbd8375;

车身

securitytoken=1319806096-618e5f9012901e2d818bf2c74c2121baa064be57&ajax=1&ajax_lastpost=1319806096&**message=%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC**&wysiwyg=0&styleid=1&signature=1&fromquickreply=1&s=&do=postreply&t=477352&p=who%20cares&specifiedpost=0&parseurl=1&loggedinuser=265374

正如我们在请求正文中所见,'äöõüäöõüäöõüäöõü编码为: %u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC

现在我要复制它。

让我们使用Java中的charset utf-8对文本进行编码:

String userText = "äöõüäöõüäöõüäöõü";
String encoded = URLEncoder.encode(userText, "utf-8");

结果:%C3%A4%C3%B6%C3%B5%C3%BC%C3%A4%C3%B6%C3%B5%C3%BC%C3%A4%C3 %B6%C3%B5%C3%BC%C3%A4%C3%B6%C3%B5%C3%BC%0A%0A%0A%5BSIZE%3D%221%22%5D%5BI%5D<不一样

让我们尝试ISO-8859-1:

String userText = "äöõüäöõüäöõüäöõü";
String encoded = URLEncoder.encode(userText, "ISO-8859-1");

结果:%E4%F6%F5%FC%E4%F6%F5%FC%E4%F6%F5%FC%E4%F6%F5%FC%0A%0A%0A %5BSIZE%3D%221%22%5D%5BI%5D<<不一样

它们都没有产生与工作示例中相同的编码字符串,但它们都具有相同的输入。我在这里缺少什么?

1 个答案:

答案 0 :(得分:4)

%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC

我不知道上面的数据被编码为什么,但它不是application/x-www-form-urlencoded; charset=UTF-8,因为请求声称。这不是此MIME类型的合法数据。

它看起来像是一些UTF-16BE编码形式。

URLEncoder.encode(userText, "utf-8");将是编码application/x-www-form-urlencoded; charset=UTF-8值的正确方法,如果这实际上是服务器所期望的。 (ref