Question

我有一个使用非ASCII字符的文件，使用文件流保存此文件时，文件中的字符不符合预期。

我写

stream
BT 38.3774 710 TD /F10 12.0000 Tf (België)Tj ET
endstream

文件中的内容是

stream
BT 38.3774 710 TD /F10 12.0000 Tf (BelgiÃ«)Tj ET
endstream

在使用filestream.write将字符串保存到文件之前，字符串是UTF8编码为字节。

有人可以帮我理解为什么会这样吗？

我已经能够在短代码中重现结果

Using newFile As New FileStream("C:\Users\Sed\Documents\test.txt", FileMode.Create)
        Dim content As String = "België"
        Dim contentByte As Byte() = New UTF32Encoding().GetBytes(content)
        newFile.Write(contentByte, 0, contentByte.Length)
        contentByte = New UTF8Encoding().GetBytes(content)
        newFile.Write(contentByte, 0, contentByte.Length)
    End Using

给出结果

B   e   l   g   i   ë   BelgiÃ«

所以我希望文件流以某种方式假定它的UTF32编码，而文件的内容是用UTF8编写的......

以UTF32编码全部并不能提供答案。该文件完全搞砸了......

仍然不明白为什么会发生这种情况，但我脑子里有一个我需要探索的解决方法。

Answer 1

我已经弄清楚了......

我按照我的方式创建文件，它使用的编码是ANSI或encoding.Default

如此改变

Dim newObjectByte As Byte() = New UTF8Encoding(True).GetBytes(DataObject("pdfObjectString").ToString())

到

Dim newObjectByte As Byte() = Encoding.Default.GetBytes(DataObject("pdfObjectString").ToString())

用代码页解决了我的问题。

感谢The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)让我想到了codePage，ANSI ASCII以及所有这些......

文件流编码不符合预期

1 个答案: