Question

因此，我在kotlin的imgui端口中使用以下字符–遇到一些问题

在整天研究字符集和编码后，我下了唯一的希望：依靠unicode代码点。

JVM上的那个字符

"–"[0].toInt() // same as codePointAt()

返回代码点u2013

在C上，我不确定，但是由于这就是done：

const ImFontGlyph* ImFont::FindGlyph(ImWchar c) const
{
    if (c >= IndexLookup.Size)
        return FallbackGlyph;
    const ImWchar i = IndexLookup.Data[c];
    if (i == (ImWchar)-1)
        return FallbackGlyph;
    return &Glyphs.Data[i];
}

哪里

typedef unsigned short ImWchar

和

ImVector<ImWchar> IndexLookup; // Sparse. Index glyphs by Unicode code-point.

所以，这样做

char* a = "–";
int b = a[0];

返回代码点u0096

据我所知，在127（0x7F）上我们似乎处于“扩展Ascii”领域，这很糟糕，因为它似乎有不同的版本/解释

例如，此encoding table与我的代码点不匹配，但是Cp1252 encoding与我的代码点匹配，因此我倾向于认为这是C语言上实际使用的语言。

在刚才提到的链接底部的表中，您实际上可以看到150（十进制，从给定数字的右列开始计数）确实对应于2013（十六进制，I觉得有点不连贯，但无论如何）。

为解决此问题，我尝试将我在Kotlin上的String转换为相同的编码（暂时忽略这当然是特定于平台的），因此对于每个c: Char

"$c".toByteArray(Charset.forName("Cp1252"))[0].toUnsignedInt

这有效，但是会中断中文，日文等外国字体的显示。

因此，我的问题是：为什么JVM上的u2013和C上的u0096之间有区别？

哪种是正确的处理方式？

Answer 1

目前，在Windows上像this一样解决问题时，我在检索char代码点之前插入了此函数。它基本上重新映射了所有与ISO-8859-1不同的字符。您可以在此table中看到它们，它们都是带有浅灰色边框的那些。

internal fun Char.remapCodepointIfProblematic(): Int {
    val i = toInt()
    return when (Platform.get()) {
        /*  https://en.wikipedia.org/wiki/Windows-1252#Character_set
         *  manually remap the difference from  ISO-8859-1 */
        Platform.WINDOWS -> when (i) {
            // 8_128
            0x20AC -> 128 // €
            0x201A -> 130 // ‚
            0x0192 -> 131 // ƒ
            0x201E -> 132 // „
            0x2026 -> 133 // …
            0x2020 -> 134 // †
            0x2021 -> 135 // ‡
            0x02C6 -> 136 // ˆ
            0x2030 -> 137 // ‰
            0x0160 -> 138 // Š
            0x2039 -> 139 // ‹
            0x0152 -> 140 // Œ
            0x017D -> 142 // Ž
            // 9_144
            0x2018 -> 145 // ‘
            0x2019 -> 146 // ’
            0x201C -> 147 // “
            0x201D -> 148 // ”
            0x2022 -> 149 // •
            0x2013 -> 150 // –
            0x2014 -> 151 // —
            0x02DC -> 152 // ˜
            0x2122 -> 153 // ™
            0x0161 -> 154 // š
            0x203A -> 155 // ›
            0x0153 -> 156 // œ
            0x017E -> 158 // ž
            0x0178 -> 159 // Ÿ
            else -> i
        }
        else -> i // TODO
    }
}

Java和C之间的代码点不匹配

1 个答案: