我遇到Ruby(1.9.3)和Powershell的问题。
我需要编写一个交互式控制台应用程序,它将处理波兰语的句子。我已经得到了帮助,可以使用波兰语变音符检索ARGV元素,但标准输入不能按照我的要求运行。
代码插图:
# encoding: UTF-8
target = ARGV[0].dup.force_encoding('CP1250').encode('UTF-8')
puts "string constant = dupą"
puts "dupą".bytes.to_a.to_s
puts "dupą".encoding
puts "target = " +target
puts target.bytes.to_a.to_s
puts target.encoding
puts target.eql? "dupą"
STDIN.set_encoding("CP1250", "UTF-8")
# the line above changes nothing, it can be removed and the result is still the same
# I obviously wanted to mimic the ARGV solution
target2 = STDIN.gets
puts "target2 = " +target2
puts target2.bytes.to_a.to_s
puts target2.encoding
puts target2.eql? "dupą"
输出:
string constant = dupą
[100, 117, 112, 196, 133]
UTF-8
target = dupą
[100, 117, 112, 196, 133]
UTF-8
true
dupą //this is fed to STDIN.gets
target2 = dup
[100, 117, 112]
UTF-8
false
显然Ruby永远不会从STDIN.gets中获取第四个字符。如果我写一个更长的字符串,如dupąlalala
,仍然只有三个初始字节出现在程序中。
我已将$ OutputEncoding更改为[Console] :: OutputEncoding;它现在看起来像这样:
IsSingleByte : True
BodyName : ibm852
EncodingName : Środkowoeuropejski (DOS)
HeaderName : ibm852
WebName : ibm852
WindowsCodePage : 1250
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : False
IsMailNewsSave : False
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 852
我正在使用Consolas字体
如何在Powershell中正确阅读波兰语变音符号?
答案 0 :(得分:1)
我发现了一些相关信息。不确定它是完全正确的信息。但是,嘿,OP已经有了另一个解决方案。
# Get "encoding" for code page 1250 (Central European)
$en=[System.Text.Encoding]::GetEncoding(1250)
# Looks like this:
IsSingleByte : True
BodyName : iso-8859-2
EncodingName : Central European (Windows)
HeaderName : windows-1250
WebName : windows-1250
WindowsCodePage : 1250
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1250
# Change STDIN's input encoding
[console]::InputEncoding=$en
$x = Read-Host
# I typed in dupą
# (I set Polish in Languate Bar.
# Final letter is apostrophe on US English keyboard)
[int[]][char[]]$x
# output is: 100 117 112 261 (in hex): 64 75 70 105
# the final character (261) is "Latin Small Letter A with Ogonek"