从UWP应用程序中的文本文件中读取unicode字符串

时间:2016-02-09 15:46:33

标签: c# windows-10 uwp windows-10-mobile windows-10-universal

在Windows 10应用程序中

我尝试从.txt文件中读取字符串并将文本设置为RichEditBox:

代码变体1:

var read = await FileIO.ReadTextAsync(file, Windows.Storage.Streams.UnicodeEncoding.Utf8);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, read);

代码变体2:

var stream = await file.OpenAsync(Windows.Storage.FileAccessMode.ReadWrite);
ulong size = stream.Size;
using (var inputStream = stream.GetInputStreamAt(0))
{
    using (var dataReader = new Windows.Storage.Streams.DataReader(inputStream))
    {
        dataReader.UnicodeEncoding = Windows.Storage.Streams.UnicodeEncoding.Utf8;
        uint numBytesLoaded = await dataReader.LoadAsync((uint)size);
        string text = dataReader.ReadString(numBytesLoaded);
        txt.Document.SetText(Windows.UI.Text.TextSetOptions.FormatRtf, text);
    }
}

在某些文件中,我有这个错误 - "目标多字节代码页中没有Unicode字符的映射"

我找到了一个解决方案:

IBuffer buffer = await FileIO.ReadBufferAsync(file);
DataReader reader = DataReader.FromBuffer(buffer);
byte[] fileContent = new byte[reader.UnconsumedBufferLength];
reader.ReadBytes(fileContent);
string text = Encoding.UTF8.GetString(fileContent, 0, fileContent.Length);
txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);

但是使用此代码,文本看起来像菱形中的问号。

如何以正常编码方式阅读和显示相同的文本文件?

3 个答案:

答案 0 :(得分:3)

这里的挑战是编码,它取决于您的应用程序需要多少准确性。 如果你需要快速而简单的东西,你可以调整这个answer

    public static Encoding GetEncoding(byte[4] bom)
    {
        // Analyze the BOM
        if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;
        if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;
        if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
        if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
        if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return Encoding.UTF32;
        return Encoding.ASCII;
    }

    async System.Threading.Tasks.Task MyMethod()
    {
        FileOpenPicker openPicker = new FileOpenPicker();
        StorageFile file = await openPicker.PickSingleFileAsync();
        IBuffer buffer = await FileIO.ReadBufferAsync(file);
        DataReader reader = DataReader.FromBuffer(buffer);
        byte[] fileContent = new byte[reader.UnconsumedBufferLength];
        reader.ReadBytes(fileContent);
        string text = GetEncoding(new byte[4] {fileContent[0], fileContent[1], fileContent[2], fileContent[3] }).GetString(fileContent);
        txt.Document.SetText(Windows.UI.Text.TextSetOptions.None, text);

        //.. 
    }

如果您需要更准确的内容,您应该考虑移植到UWP,移植to .Net Mozilla charset detector,如此answer

中所述

请注意,上面的代码只是一个示例,它缺少实现IDisposable类型的所有using语句,它也应该以更一致的方式编写

HTH -g

答案 1 :(得分:0)

解决方案:

1)我为UWP建立了一个Mozilla Universal Charset Detector端口(添加到Nuget

ICharsetDetector cdet = new CharsetDetector();
cdet.Feed(fileContent, 0, fileContent.Length);
cdet.DataEnd();

2)Nuget library Portable.Text.Encoding

if (cdet.Charset != null)
string text = Portable.Text.Encoding.GetEncoding(cdet.Charset).GetString(fileContent, 0, fileContent.Length);

这就是全部。现在unicode ecnodings(包括cp1251,cp1252) - 效果很好))

答案 2 :(得分:0)

        StorageFile file = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Assets/FontFiles/" + fileName));
        using (var inputStream = await file.OpenReadAsync())
        using (var classicStream = inputStream.AsStreamForRead())
        using (var streamReader = new StreamReader(classicStream))
        {
            while (streamReader.Peek() >= 0)
            {
                line = streamReader.ReadLine();
           }
       }