Question

我有一个测试程序，可以演示我希望的最终结果（即使在这个测试程序中，步骤似乎也没必要）。

程序使用GZipStream将数据压缩到文件。生成的压缩文件为 C：\ mydata.dat 。

然后我读了这个文件，并将其写入一个新文件。

//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(@"C:\mydata.dat"))
{
    compressedFile = reader.ReadToEnd();
    reader.Close();
    reader.Dispose();
}

//Write to a new file
using (StreamWriter file = new StreamWriter(@"C:\mynewdata.dat"))
{
    file.WriteLine(compressedUserFile);
}

当我尝试解压缩这两个文件时，原始文件会完全解压缩，但新文件会抛出InvalidDataException并显示消息 GZip标头中的幻数不正确。确保您传入的是GZip流。

为什么这些文件不同？

Answer 1

StreamReader用于读取一系列字符，而不是字节。这同样适用于StremWriter。由于将压缩文件视为字符流没有任何意义，因此您应该使用Stream的某些实现。如果要将流作为字节数组获取，可以使用MemoryStream。

使用字符流不起作用的确切原因是它们默认采用UTF-8编码。如果某个字节无效UTF-8（如标题的第二个字节，0x8B），则表示为Unicode“替换字符”（U + FFFD）。当写回字符串时，该字符使用UTF-8编码为与源中完全不同的字符。

例如，要从流中读取文件，请将其作为字节数组获取，然后将其作为流写入另一个文件：

byte[] bytes;
using (var fileStream = new FileStream(@"C:\mydata.dat", FileMode.Open))
using (var memoryStream = new MemoryStream())
{
    fileStream.CopyTo(memoryStream);
    bytes = memoryStream.ToArray();
}

using (var memoryStream = new MemoryStream(bytes))
using (var fileStream = new FileStream(@"C:\mynewdata.dat", FileMode.Create))
{
    memoryStream.CopyTo(fileStream);
}

CopyTo()方法仅适用于.Net 4，但如果您使用旧版本，则you can write your own。

当然，对于这个简单的例子，不需要使用流。你可以这样做：

byte[] bytes = File.ReadAllBytes(@"C:\mydata.dat");
File.WriteAllBytes(@"C:\mynewdata.dat", bytes);

Answer 2

编辑：显然，我的建议是错误/无效/无论如何......请使用其中一个毫无疑问经过高度重新考虑的其他人，以至于无法实现额外的表现（否则，那就是意味着他们和我的一样无效）

using (System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\mydata.dat"))
{
    using (System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\mynewdata.dat"))
    {
        byte[] bytes = new byte[1024];
        int count = 0;
        while((count = sr.BaseStream.Read(bytes, 0, bytes.Length)) > 0){
            sw.BaseStream.Write(bytes, 0, count);
        }
    }
}

读取所有字节

byte[] bytes = null;
using (System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\mydata.dat"))
{
    bytes = new byte[sr.BaseStream.Length];
    int index = 0;
    int count = 0;
    while((count = sr.BaseStream.Read(bytes, index, 1024)) > 0){
        index += count;
    }
}

读取所有字节/写入所有字节（来自svick的答案）：

byte[] bytes = File.ReadAllBytes(@"C:\mydata.dat");
File.WriteAllBytes(@"C:\mynewdata.dat", bytes);

与其他答案进行性能测试：

刚刚在我的Answer（StreamReader）（上面的第一部分，文件副本）和svick的答案（FileStream / MemoryStream）（第一部分）之间进行了快速测试。测试是1000次迭代的代码，这里是4次测试的结果（结果是整秒，所有实际结果都略高于这些值）：

My Code | svick code
--------------------
9       | 12
9       | 14
8       | 13
8       | 14

正如您所看到的，至少在我的测试中，我的代码表现得更好。我可能要注意的一件事是我没有读取字符流，实际上我正在访问提供字节流的BaseStream。也许svick的答案很慢，因为他使用两个流来阅读，然后两个用于写作。当然，可以对svick的答案进行大量优化以提高性能（他还提供了简单文件复制的替代方案）

使用第三个选项（ReadAllBytes / WriteAllBytes）进行测试

My Code | svick code | 3rd
----------------------------
8       | 14         | 7
9       | 18         | 9
9       | 17         | 8
9       | 17         | 9

注意：以毫秒为单位，第3个选项总是更好

读取压缩文件并写入新文件将不允许解压缩

2 个答案: