Question

我有一个使用Windows 1252代码页（从SQL Server 2008 varchar字段输入）读取数据的进程。然后，我将此数据写入平面文本文件，该文件由使用EBCDIC 37代码页的IBM大型机系统选取。该系统将文件转换为自己的字符集。但是，扩展的ASCII范围（字符代码128 - 255）中的某些内容很难被大型机很好地转换。我认为这是因为在EBCDIC字符集中不存在Windows字符集中的某些字符。

是否有通用的方法来确定我需要过滤掉哪些字符，例如左单引号，右单引号，左双引号，右双引号，项目符号，短划线和em破折号（Windows代码145 - 分别为151），仅举几例？如果是这样，我是否可以使用一些算法来确定最接近的EBCDIC等价物（例如左侧单引号或右侧单引号的正常单引号）？

Answer 1

我一直在寻找解决这个问题的一般方法，而不是只关注EBCDIC 37，我不想在视觉上比较两个代码图表。我写了一个简短的程序（用VB.NET）来查找一个代码页中存在的所有字符而不是另一个代码页。

' Pick source and target codepages.
Dim sourceEncoding As Encoding = Encoding.Default ' This is Windows 1252 on Windows OS.
Dim targetEncoding As Encoding = Encoding.GetEncoding("IBM037")

' Get every character in the codepage.
Dim inbytes(256) As Byte
For code As Integer = 0 To 255
    inbytes(code) = Convert.ToByte(code)
Next

' Convert the bytes from the source encoding to the target, then back again.
' Those bytes that convert back to the original value exist in both codepages.
' The bytes that change do not exist in the target encoding.
Dim input As String = sourceEncoding.GetString(inbytes)
Dim outbytes As Byte() = Encoding.Convert(sourceEncoding, targetEncoding, inbytes)
Dim convertedbytes As Byte() = Encoding.Convert(targetEncoding, sourceEncoding, outbytes)
Dim output As String = sourceEncoding.GetString(convertedbytes)
Dim diffs As New List(Of Char)()
For idx As Integer = 0 To input.Length - 1
    If input(idx) <> output(idx) Then
        diffs.Add(input(idx))
    End If
Next

' Print results.
Console.WriteLine("Source: " + input)
Console.WriteLine("(Coded): " + String.Join(" ", inbytes.Select(Function (x) Convert.ToInt32(x).ToString()).ToArray()))
Console.WriteLine()
Console.WriteLine("Target: " + output)
Console.WriteLine("(Coded): " + String.Join(" ", convertedbytes.Select(Function (x) Convert.ToInt32(x).ToString()).ToArray()))
Console.WriteLine()
Console.WriteLine("Cannot convert: " + String.Join(" ", diffs.Select(Function (x) Convert.ToInt32(x).ToString()).ToArray()))

对于Windows 1252到EBCDIC 37的情况，有27个字符没有映射。我选择了我认为最适合这些角色的东西。

在.NET中，如何确定EBCDIC 37代码页中不存在Windows 1252代码页中存在的字符？

1 个答案: