Java使String成为编码String

时间:2018-05-27 16:04:52

标签: java encoding utf-8

来自服务器的GET http是:     // HTTP GET请求     private static List sendGet()抛出异常{

    String url = "http://********/ReciveMessage";

    URL obj = new URL(url);
    HttpURLConnection con = (HttpURLConnection) obj.openConnection();

    // optional default is GET
    con.setRequestMethod("GET");

    //add request header

    con.setRequestProperty("Accept-Charset", "UTF-8");

    int responseCode = con.getResponseCode();

    BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream(),"UTF-8"));
    String inputLine;
    StringBuffer response = new StringBuffer();

    while ((inputLine = in.readLine()) != null) {
        response.append(inputLine);
        System.out.println(inputLine);
    }
    in.close();


    String str = response.toString(); //str is the problem


}

我从服务器获取一串字符, 看起来像这样:

str = "\\u05d0";

我注意到我无法解码字符串。

所以我很想知道怎么做到这个,

str = "\u05d0";

2 个答案:

答案 0 :(得分:1)

假设您的服务器只返回以您描述的形式编码的Unicode代码点流(没有原始字符,例如,只有\u1234形式的代码点),则以下代码将此类序列转换为已解码字符:

public class UnicodeDecoder {

    private static final Pattern UNICODE_CHARACTER_PATTERN =
            Pattern.compile("\\\\u([0-9A-Fa-f]{2,4})");

    public static void main(String[] args) {
        String raw = "\\u05d0\\u05d1\\u05d2\\u05d3";

        StringBuilder sb = new StringBuilder(raw.length() / 7);

        Matcher matcher = UNICODE_CHARACTER_PATTERN.matcher(raw);
        while (matcher.find()) {
            String hexCode = matcher.group(1);
            char[] decodedChars = Character.toChars(
                    Integer.valueOf(hexCode, 16));
            sb.append(decodedChars);
        }

        System.out.println("Raw:\n"+raw);
        System.out.println("Decoded:\n"+sb.toString());
    }   
}

此示例代码提供输出:

Raw:
\u05d0\u05d1\u05d2\u05d3
Decoded:
אבגד

请注意,此方法效率不高。如果性能很重要,那么您可以重新设置它以手动获取每个\u1234序列的子字符串,然后将解码后的字符添加到子字符串中。这将消除正则表达式匹配器的成本。

如果您的服务器返回Unicode代码点以外的字符,那么您必须逐个字符地检查服务器的响应,检查\u1234序列。任何非Unicode码点序列都应直接添加到StringBuilder;任何Unicode代码点都应首先解码为字符。

答案 1 :(得分:-1)

我使用Bobulous解决方案并对其进行修改,现在它在同一个字符串中使用ASCII和UTF-8:

private String Decode(String raw) {
    final Pattern UNICODE_CHARACTER_PATTERN = Pattern.compile("\\\\u([0-9A-Fa-f]{2,4})");

    StringBuilder sb = new StringBuilder(raw.length() / 7);

    Matcher matcher = UNICODE_CHARACTER_PATTERN.matcher(raw);

    while (raw.length() != 0) {
        if (raw.charAt(0) == '\\') {
            matcher = UNICODE_CHARACTER_PATTERN.matcher(raw);
            String hexCode = "";
            char[] decodedChars = null;
            boolean find = false;
            if (matcher.find()) {
                find = true;
                hexCode = matcher.group(1);
                decodedChars = Character.toChars(Integer.valueOf(hexCode, 16));
                sb.append(decodedChars);
            }
            if(find)
                raw = raw.substring(matcher.group(0).length());
            else {
                if(raw.length() > 2) {
                    char c = (raw.charAt(1));
                    raw = raw.substring(2);
                    switch(c) {
                        case 'n':
                            sb.append("\n");
                            break;
                        case 't':
                            sb.append("\t");
                            break;
                        case 'b':
                            sb.append("\b");
                            break;
                        case 'r':
                            sb.append("\r");
                            break;
                        case 'f':
                            sb.append("\f");
                            break;
                        case '\'':
                            sb.append("\\");
                            break;
                        case '\"':
                            sb.append("\"");
                            break;
                        default:
                            sb.append("\\"+c);
                            break;
                    }
                }else {
                    raw = raw.substring(1);
                    sb.append("\\");
                }
            }
        } else {
            sb.append(raw.charAt(0));
            raw = raw.substring(1);
        }
    }       

    return sb.toString();
}

更新:添加\ n,\ t等案例。