Question

我有一个文件，其中包含与某些压缩文本混合的明文，例如：

Version 01
Maker SomeCompany

l 73
mark
h�22V0P���w�/�+Q0���L)�66□ // This line was compressed using DeflateZLib
endmark

似乎微软有一个解决方案，DeflateStream类，但他们的例子展示了如何在整个文件中使用它，而我无法弄清楚如何在我的文件中的一行上使用它

到目前为止，我有以下内容：

bool isDeflate = false;

using (var fs = new FileStream(@"C:\Temp\MyFile.dat", FileMode.Open)
using (var reader = new StreamReader(fs))
{
     string line;
     while ((line = reader.ReadLine()) != null)
     {
         if (isDeflate)
         {
             if (line == "endmark")
             {
                 isDeflate = false;
             }
             else
             {
                 line = DeflateSomehow(line);
             }
         }

         if (line == "mark")
         {
             isDeflate = true;
         }

         Console.WriteLine(line);
     }
}

public string DeflateSomehow(string line)
{
    // How do I deflate just that string?
}

由于文件不是由我创建的（我们只是在阅读它），我们无法控制它的结构...但是，我并没有被我现在的代码所束缚。如果我需要更改它而不仅仅是弄清楚如何实现 DeflateSomehow 方法，那么我也可以使用它。

Answer 1

deflate流适用于二进制数据。文本文件中间的任意二进制块也称为：损坏的文本文件。没有理智的解码方法：

你不能读“行”，因为在谈论二进制数据时没有“行”的定义; CR / LF / CRLF / etc的任何组合都可以在二进制数据中完全随机发生
您无法读取“字符串行”，因为这表示您通过Encoding运行数据;但由于这个不是文本数据，再次：这只会给你无法处理的乱码（读取时会丢失数据）

现在，通过Stream API而不是StreamReader API阅读，可以解决这两个问题中的第二个问题，这样您就只能阅读 binary ;然后你需要自己查找行结尾，使用Encoding来探测你能做什么（注意如果你使用UTF-等多字节/可变字节编码，这并不像听起来那么简单8）。

然而，这两个问题中的第一个本质上是不可解决的。要做到这一点，你需要某种二进制成帧协议 - 这在文本文件中也不存在。看起来这个例子正在使用“mark”和“endmark” - 再次，技术上有可能会随机出现这些，但你可能会为99.999％的案例侥幸逃脱。然后，诀窍是使用Stream和Encoding手动读取整个文件，查找“mark”和“endmark” - 并从这些位中剥离编码为文本的位。压缩数据。然后通过正确的Encoding运行编码为文本的片段。

然而！在您阅读二进制文件时，它很简单：您只需缓冲正确的数量（使用写入数据的任何框架/标记协议），并使用类似的内容：

using(var ms = new MemoryStream(bytes))
using(var inflate = new GZipStream(ms, CompressionMode.Decompress))
{
    // now read from 'inflate'
}

通过添加l 73标记以及它是ASCII的信息，它变得更加可行。

这对我来说无效因为SO上的数据已经损坏（将二进制文本作为文本发布），但基本上类似于：

using System;
using System.Collections.Generic;
using System.IO;
using System.IO.Compression;
using System.Text;
using System.Text.RegularExpressions;
class Program
{
    static void Main()
    {
        using (var file = File.OpenRead("my.txt"))
        using (var buffer = new MemoryStream())
        {
            List<string> lines = new List<string>();
            string line;
            while ((line = ReadToCRLF(file, buffer)) != null)
            {
                lines.Add(line);
                Console.WriteLine(line);
                if (line == "mark" && lines.Count >= 2)
                {
                    var match = Regex.Match(lines[lines.Count - 2], "^l ([0-9]+)$");
                    int bytes;
                    if (match.Success && int.TryParse(match.Groups[1].Value, out bytes))
                    {
                        ReadBytes(file, buffer, bytes);
                        string inflated = Inflate(buffer);
                        lines.Add(inflated); // or something similar
                        Console.WriteLine(inflated);
                    }
                }
            }
        }

    }
    static string Inflate(Stream source)
    {
        using (var deflate = new DeflateStream(source, CompressionMode.Decompress, true))
        using (var reader = new StreamReader(deflate, Encoding.ASCII))
        {
            return reader.ReadToEnd();
        }
    }
    static void ReadBytes(Stream source, MemoryStream buffer, int count)
    {
        buffer.SetLength(count);
        int read, offset = 0;
        while (count > 0 && (read = source.Read(buffer.GetBuffer(), offset, count)) > 0)
        {
            count -= read;
            offset += read;
        }
        if (count != 0) throw new EndOfStreamException();
        buffer.Position = 0;
    }
    static string ReadToCRLF(Stream source, MemoryStream buffer)
    {
        buffer.SetLength(0);
        int next;
        bool wasCr = false;
        while ((next = source.ReadByte()) >= 0)
        {
            if(next == 10 && wasCr) { // CRLF
                // end of line (minus the CR)
                return Encoding.ASCII.GetString(
                     buffer.GetBuffer(), 0, (int)buffer.Length - 1);
            }
            buffer.WriteByte((byte)next);
            wasCr = next == 13;
        }
        // end of file
        if (buffer.Length == 0) return null;
        return Encoding.ASCII.GetString(buffer.GetBuffer(), 0, (int)buffer.Length);

    }
}

如何在文件的一行中使用DeflateStream类？

1 个答案: