将非Unicode转换为Unicode

时间:2012-07-13 06:30:59

标签: vb.net unicode

我正在尝试将像这样的非Unicode字符串'¹ûº¤¢¤¤ì©2'转换为Unicode,'ໃຊ້ໃນຄົວເຮືອນ'老挝。我尝试使用下面的代码,它的返回值是这样的,'??????'。知道如何转换字符串?

Public Shared Function ConvertAsciiToUnicode(asciiString As String) As String
    ' Create two different encodings.
    Dim encAscii As Encoding = Encoding.ASCII
    Dim encUnicode As Encoding = Encoding.Unicode

    ' Convert the string into a byte[].
    Dim asciiBytes As Byte() = encAscii.GetBytes(asciiString)

    ' Perform the conversion from one encoding to the other.
    Dim unicodeBytes As Byte() = Encoding.Convert(encAscii, encUnicode, asciiBytes)

    ' Convert the new byte[] into a char[] and then into a string.
    ' This is a slightly different approach to converting to illustrate
    ' the use of GetCharCount/GetChars.
    Dim unicodeChars As Char() = New Char(encUnicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length) - 1) {}
    encUnicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, unicodeChars, 0)
    Dim unicodeString As New String(unicodeChars)

    ' Return the new unicode string
    Return unicodeString
End Function

1 个答案:

答案 0 :(得分:4)

您的8位编码老挝文本不是ASCII格式,而是在某些代码页中,如IBM CP1133或Microsoft LC0454,或者很可能是泰语代码页874.您必须找出它是哪一个。

重要的是你如何获得(读取,接收,计算)输入字符串。当你把它作为字符串时,它已经是Unicode并且很容易以UTF-8输出,例如,像这样:

Dim writer As New StreamWriter("myfile.txt", True, System.Text.Encoding.UTF8)
writer.Write(mystring)
writer.Close()

以下是整个内存转换:

Dim utf8_input as Byte()
...
Dim converted as Byte() = Encoding.Convert(Encoding.GetEncoding(874), Encoding.UTF8, utf8_input)

数字874是您输入的代码页中的数字。特定操作系统安装是否支持此代码页是另一个问题,但如果您只是用它来编写Stack Overflow问题,那么您自己的系统几乎肯定会支持它。