Question

XSSFCell似乎将某些字符序列编码为unicode字符。我怎么能阻止这个？我是否需要应用某种角色转义？

e.g。

cell.setCellValue("LUS_BO_WP_x24B8_AI"); // The cell value now is „LUS_BO_WPⒸAI"

在Unicode中Ⓒ是U+24B8

我已经尝试设置ANSI字体并将单元格类型设置为字符串。

Answer 1

此字符转换在XSSFRichTextString.utfDecode（）

中完成

我现在已经编写了一个函数，它基本上反过来做同样的事情。

private static final Pattern utfPtrn = Pattern.compile("_(x[0-9A-F]{4}_)");

private static final String UNICODE_CHARACTER_LOW_LINE = "_x005F_";

public static String escape(final String value) {
    if(value == null) return null;

    StringBuffer buf = new StringBuffer();
    Matcher m = utfPtrn.matcher(value);
    int idx = 0;
    while(m.find()) {
        int pos = m.start();
        if( pos > idx) {
            buf.append(value.substring(idx, pos));
        }

        buf.append(UNICODE_CHARACTER_LOW_LINE + m.group(1));

        idx = m.end();
    }
    buf.append(value.substring(idx));
    return buf.toString();
}

Answer 2

基于@matthias-gerth 的建议，稍作修改：

创建您自己的 XSSFRichTextString 类
像这样调整XSSFRichTextString.setString：st.setT(s); >> st.setT(escape(s));
这样修改 XSSFRichTextString 的构造函数：st.setT(str); >> st.setT(escape(str));

在 XSSFRichTextString 中添加这些内容（这与 Matthias 的建议非常接近）：

 private static final Pattern PATTERN = Pattern.compile("_x[a-fA-F0-9]{4}");
 private static final String UNICODE_CHARACTER_LOW_LINE = "_x005F";

 private String escape(String str) {
     if (str!=null) {
         Matcher m = PATTERN.matcher(str);
         if (m.find()) {
             StringBuffer buf = new StringBuffer();
             int idx = 0;
             do {
                 int pos = m.start();
                 if( pos > idx) {
                     buf.append(str.substring(idx, pos));
                 }
                 buf.append(UNICODE_CHARACTER_LOW_LINE + m.group(0));

                 idx = m.end();
             } while (m.find());
             buf.append(str.substring(idx));
             return buf.toString();
         }
     }
     return str;
 }

Apache POI中的XSSFCell将某些字符序列编码为unicode字符

2 个答案: