Question

使用Sms Gateway http api在发送unicode文本时工作正常，

即。 नमस्ते的Unicode：

%26%232344%3B%26%232350%3B%26%232360%3B%26%232381%3B%26%232340%3B%26%232375%3B

如果我在浏览器中通过SMS网关提供的以下API发送此Unicode。

http://msdgweb.mgov.gov.in/esms/sendsmsrequest?username=*****&password=****&smsservicetype=unicodemsg&content=%26%232344%3B%26%232350%3B%26%232360%3B%26%232381%3B%26%232340%3B%26%232375%3B&mobileno=*****&senderid=****

我在手机上收到的短信是：नमस्ते：

在Java中使用相同的API时，我尝试使用UTF-8 Unicode，Text与%26%232344%3B%26%232350%3B%26%232360%3B%26%232381%3B%26%232340%3B%26%232375%3B相同。

我在应用程序代码方面遗漏了什么？

Answer 1

String text = "नमस्ते";

不知何故，文本被翻译为HTML实体：

&#2344;&#2350;&#2360;&#2381;&#2340;&#2375;

将HTML表单发布到未声明其接受（例如）UTF-8（Unicode）的服务器时，可能会发生这种情况。然后浏览器转换输入字段。同样以下列形式陈述：

<form action="..." accept-charset="UTF-8">

然后，此字符串采用网址编码，&为%2B etcera。

作为重新转换HTML实体的补丁（Ӓ），请使用Apache common：

s = StringEscapeUtils.unescapeHTML(s);

或转换自己：

String convertHtmlEntities(String s) {
     Pattern pattern = Pattern.compile("\\&#(\\d{1,7});");
     Matcher m = patter.matcher(s);
     StringBuffer sb = new StringBuffer();
     while (m.find()) {
         int cp = Integer.parseInt(m.group(1));
         String ch = new String(new int[] { cp }, 0, 1);
         m.appendReplacement(sb, ch);
     }
     m.appendTail(sb);
     return sb.toString();
}

使用Java

1 个答案: