Question

我维护一个基于java servlet的小型webapp，它提供输入表单，并将这些表单的内容写入MariaDB。

该应用程序在Linux机器上运行，但用户从Windows访问webapp。

有些用户将文本粘贴到从MSWord文档复制的这些表单中，当发生这种情况时，会出现如下内部异常：

引起：org.mariadb.jdbc.internal.util.dao.QueryException：字符串值不正确：'\ xC2 \ x96 for ...'用于列'ssimpact'在行 1

例如，我用以下文字测试了它：

项目 - 用于

短划线是MSWord文档中的“长划线”。

我认为不可能将此文本中的任意角色转换为“正确”的字符，所以我试图弄清楚如何生成一个合理的错误消息，显示有问题的坏文本的子字符串，以及第一个坏人物的索引。

我注意到这样的帖子：How to determine if a String contains invalid encoded characters。

我认为这会让我接近，但它并不是很有效。

我正在尝试使用以下方法：

private int findUnmappableCharIndex(String entireString) {
    int charIndex;
    for (charIndex = 0; charIndex < entireString.length(); ++ charIndex) {
        String  currentChar   = entireString.substring(charIndex, charIndex + 1);
        CharBuffer  out = CharBuffer.wrap(new char[currentChar.length()]);
        CharsetDecoder  decoder = Charset.forName("utf-8").newDecoder();
        CoderResult result  = decoder.decode(ByteBuffer.wrap(currentChar.getBytes()), out, true);
        if (result.isError() || result.isOverflow() || result.isUnderflow() || result.isMalformed() || result.isUnmappable()) {
            break;
        }
        CoderResult flushResult = decoder.flush(out);
        if (flushResult.isOverflow()) {
            break;
        }
    }
    if (charIndex == entireString.length() + 1) {
        charIndex   = -1;
    }
    return charIndex;
}

这不起作用。我在第一个字符上得到“下溢”，这是一个有效的字符。我确信我不完全理解解码器机制。

如何找出哪个角色没有映射到utf-8

0 个答案: