Question

我有一个进程可以获取一系列“xml”文件。我把xml放在引号中的原因是文件中的文本没有一个根元素，它使得xml无效。在我的处理中，我想纠正这个并打开每个文件，在每个文件的开头和结尾添加一个根节点，然后将其关闭。这是我的想法，但这涉及打开文件，读取整个文件，在节点上标记，然后写出整个文件。这些文件的大小可能超过20 MB。

        foreach (FileInfo file in files)
        {
            //open the file
            StreamReader sr = new StreamReader(file.FullName);

            // add the opening and closing tags
            string text = "<root>" + sr.ReadToEnd() + "<root>";
            sr.Close();

            // now open the same file for writing
            StreamWriter sw = new StreamWriter(file.FullName, false);
            sw.Write(text);
            sw.Close();
        }

有什么建议吗？

Answer 1

要避免将整个文件保留在内存中，请重命名原始文件，然后使用StreamReader将其打开。然后使用StreamWriter打开原始文件名以创建新文件。

将<root>前缀写入文件，然后将大数据块中的数据从阅读器复制到编写器。当您传输了所有数据时，请写下结束</root>（如果您希望它是XML，请注意正斜杠）。然后关闭这两个文件并删除重命名的原始文件。

char[] buffer = new char[10000];

string renamedFile = file.FullName + ".orig";
File.Move(file.FullName, renamedFile);

using (StreamReader sr = new StreamReader(renamedFile))
using (StreamWriter sw = new StreamWriter(file.FullName, false))
{
    sw.Write("<root>");

    int read;
    while ((read = sr.Read(buffer, 0, buffer.Length)) > 0)
        sw.Write(buffer, 0, read);

    sw.Write("</root>");
}

File.Delete(renamedFile);

Answer 2

20 MB并不是非常多，但是当你把它作为一个字符串读出来时，它将使用大约40 MB的内存。这也不是很多，但它是你不需要做的处理。您可以将其作为原始字节处理，以减少内存使用，并避免解码和重新编码数据：

byte[] start = Encoding.UTF8.GetBytes("<root>");
byte[] ending = Encoding.UTF8.GetBytes("</root>");

byte[] data = File.ReadAllBytes(file.FullName);

int bom = (data[0] == 0xEF) ? 3 : 0;

using (FileStream s = File.Create(file.FullName)) {
   if (bom > 0) {
      s.Write(data, 0, bom);
   }
   s.Write(start, 0, start.Length);
   s.Write(data, bom, data.Length - bom);
   s.Write(ending, 0, ending.Length);
}

如果您需要更多地恢复内存使用量，请使用Earwicker建议的第二个文件。

编辑：
添加了处理BOM（字节顺序标记）的代码。

Answer 3

我看不出任何真正的改进......这有点令人失望。由于无法“移动”文件，因此您必须始终移动整个文件中的字节以在顶部注入任何内容。

使用原始流而不是StreamReader可以发现一些性能优势，StreamReader必须将流实际解析为文本。

Answer 4

如果您不想这样做是C＃，则可以在命令行或批处理文件中轻松处理。

ECHO ^<root^> > outfile.xml
TYPE temp.xml >> outfile.xml
ECHO ^</root^> >> outfile.xml

这将假设您有一些现有的过程来获取可以挂钩的数据文件。

在C＃中将文本添加到文件的开头和结尾

4 个答案: