Question

在我的HBase表中，有一些编码的表情符号，如\ xF0 \ x9F \ x8C \ x8F和\ xE2 \ x9A \ xBE。我试图使用Bytes.toString（）来解码它们。但是，此方法使用utf-8，它只能解码三个字节的代码，如\ xE2 \ x9A \ xBE，四个字节的代码如\ xF0 \ x9F \ x8C \ x8F似乎是一个问号（见下文）。那么如何将四个字节的代码解码为表情符号并将其打印出来呢？有人有想法吗？提前谢谢！

示例：

结果应为：

但我得到了

我很抱歉，我忘了提到我正在使用servlet查询HBase并将内容写入响应。

Answer 1

当我读取包含以下字符＆＃34;＆＃34;（F09F8C8F或U + 1F30F）的文件时，它有一个表明UTF-8编码的BOM，我通过使用正确地将其转换为UTF-8

byte[] encoded = Files.readAllBytes(selectedFile.toPath());
String fileContents = new String(encoded, StandardCharsets.UTF_8);

生成的String在我的Java Swing应用程序中正确转换并正确显示。但是如果我将相同的字符串打印到控制台，我会得到一个盒装问号而不是符号。所以角色被正确转换，但它只是你的输出让它搞砸了。

要重新创建它，您可以使用：

public static void main(String[] args) throws Exception {
  byte[] encoded = { (byte) 0xF0, (byte) 0x9F, (byte) 0x8C, (byte) 0x8F };
  String convertedstring = new String(encoded, StandardCharsets.UTF_8);

  System.out.println("convertedstring: " + convertedstring);

  JDialog dialog = new JDialog();
  dialog.setSize(300, 100);
  dialog.setLocationRelativeTo(null);
  dialog.setTitle("encoding-test");
  dialog.setDefaultCloseOperation(WindowConstants.DISPOSE_ON_CLOSE);
  JLabel label = new JLabel("convertedstring: " + convertedstring);
  dialog.add(label);

  dialog.setVisible(true);
}

控制台输出

JDialog输出

您可能还想看到Default character encoding for java console output和Java, UTF-8, and Windows console

如何使用Java API解码HBase中的表情符号（unicode）？

1 个答案: