Question

Jline是一个模块，用于在用户按下Enter之前拦截控制台上的用户输入。它使用JNA或类似的魔法。

我正在做一些实验，当我输入更多“异国情调”的Unicode字符时，我遇到了编码问题。这里的操作系统是W10，我正在使用Cygwin。这也是在Groovy中，但对Java人员来说应该是显而易见的。

def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()
terminal.enterRawMode()
// NB the Terminal I get is class org.jline.terminal.impl.PosixSysTerminal
def reader = terminal.reader()

def bytes = [] // NB class ArrayList
int readInt = -1
while( readInt != 13 && readInt != 10 ) {
    readInt = reader.read()
    byte convertedByte = (byte)readInt
    // see what the binary looks like:
    String binaryString = String.format("%8s", Integer.toBinaryString( convertedByte & 0xFF)).replace(' ', '0')
    println "binary |$binaryString|"
    bytes << (byte)readInt // NB means "append to list"
    println ">>> read |$readInt| byte |$convertedByte|"
}
// strip final byte (13 or 10)
bytes = bytes[0..-2]
println "z bytes $bytes, class ${bytes.class.name}"
def response = new String( (byte[])bytes.toArray(), 'UTF-8' )
// to get proper out encoding for Cygwin I then need to do this (I have no idea why!)
def psOut = new PrintStream(System.out, true, 'UTF-8' )
psOut.print( "using PrintStream: |$response|" )

这适用于单字节Unicode，并且像“é”（2字节）这样的字母可以得到很好的处理。但是“ẃ”出了问题：

ẃ --> Unicode U+1E83 
    UTF-8 HEX: 0xE1 0xBA 0x83 (e1ba83) 
    BINARY: 11100001:10111010:10000011

实际上输入“ẃ”时输入的二进制数是11100001：10111010： 10010010 。

这转换为U + 1E92，这是另一个波兰人角色，“Ẓ”。这确实是在response String中打印出来的。

不幸的是，JLine包给你reader，这是类org.jline.utils.NonBlocking$NonBlockingInputStreamReader ...所以我真的不知道我可以做些什么来研究它的编码（我假设是UTF-8）或者以某种方式修改它...任何人都可以解释问题是什么？

Answer 1

据我所知，这与Cygwin特有的问题有关，正如一年前所提出的那样answered by me。

在my answer中有一个解决方案，我直接在这个问题之后直接询问...正确处理Unicode输入，即使在基本多语言平面之外，使用JLine，...并使用Cygwin控制台......希望如此。

JLine的编码问题

1 个答案: