C#Performance - Chunking使用AppendAllText写入文件

时间:2013-08-01 16:33:21

标签: c#

是否有更优雅/更快的方式来编写下面的代码?目前大约需要45秒。

query.sql长度为200,000行,并且在每一行中都包含SQL:

SELECT N'+dave' AS [AccountName], N'20005' AS [EmployeeID], N'-6' AS [PlatformID] UNION ALL

我发现通过分块到1000块,事情比等到最后并使用WriteAllText(大约需要20分钟才能运行)要快得多。

static void Main(string[] args)
{
    var s = new Stopwatch();
    s.Start();

    string textToWrite = "";
    string[] lines = File.ReadAllLines(@"e:\temp\query.sql");

    int i = 0;
    foreach (var line in lines)
    {
        var bits = line.Split('\'');

        var value1 = bits[1];
        var value2 = bits[3];
        var value3 = bits[5];

        var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
                    "'" + value1 + "', "
                    + value2 + ", "
                    + value3 + ")";

        textToWrite += message + Environment.NewLine;

        if (i % 1000 == 0)
        {
            Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
            File.AppendAllText(@"e:\temp\query2.sql", textToWrite);
            textToWrite = "";
        }
        i++;
    }

    //File.WriteAllText(@"e:\temp\query2.sql", textToWrite);
    File.AppendAllText(@"e:\temp\query2.sql", textToWrite);

    s.Stop();
    TimeSpan ts = s.Elapsed;
    Console.WriteLine("Timespan: {0}m", ts.TotalMinutes);
    Console.WriteLine("Total records: " + i);

    Console.ReadLine();
}

编辑:StringBuilder解决方案(1000毫秒):

static void Main2(string[] args)
{
    var s = new Stopwatch();
    s.Start();

    var textToWrite = new StringBuilder();
    string[] lines = File.ReadAllLines(@"e:\temp\query.sql");

    int i = 0;
    foreach (var line in lines)
    {
        var bits = line.Split('\'');

        var value1 = bits[1];
        var value2 = bits[3];
        var value3 = bits[5];

        var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
                    "'" + value1 + "', "
                    + value2 + ", "
                    + value3 + ")"
                    + Environment.NewLine;

        textToWrite.Append(message);

        // Buffering
        if (i % 1000 == 0)
        {
            Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
            File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
            textToWrite = new StringBuilder();
        }
        i++;
    }

    File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());

    s.Stop();
    TimeSpan ts = s.Elapsed;
    Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
    Console.WriteLine("Total records: " + i);

    Console.ReadLine();
}

编辑:StreamWriter解决方案(450毫秒)

static void Main(string[] args)
    {
        var s = new Stopwatch();
        s.Start();

        string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
        int i = 0;
        using (StreamWriter writer = File.AppendText(@"e:\temp\query2.sql"))
        {
            foreach (var line in lines)
            {
                var bits = line.Split('\'');

                var value1 = bits[1];
                var value2 = bits[3];
                var value3 = bits[5];

                writer.WriteLine("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})",
                    value1, value2, value3);

                i++;
            }
        }

        s.Stop();
        TimeSpan ts = s.Elapsed;
        Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
        Console.WriteLine("Total records: " + i);

        Console.ReadLine();
    }

5 个答案:

答案 0 :(得分:4)

正如其他人所指出的那样,使用StringBuilder。所以在你的情况下,声明:

StringBuilder textToWrite = new StringBuilder();

然后:

textToWrite.AppendLine(message);
if (i % 1000 == 0)
{
    Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
    File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
    textToWrite = new StringBuilder();
}

虽然你可能最好完全放弃缓冲:

using (StreamWriter writer = File.AppendText(filename))
{
    // initialization stuff here

    foreach (var line in lines)
    {
        var bits = line.Split('\'');

        var value1 = bits[1];
        var value2 = bits[3];
        var value3 = bits[5];

        var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping]                     ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
                "'" + value1 + "', "
                + value2 + ", "
                + value3 + ")";

         writer.WriteLine(message); // write the line
    }
}

答案 1 :(得分:2)

一个好的开始是使用.net中的类内置的StringBuilder。这将避免一堆字符串分配和复制。

请参阅MSDN文档,了解其工作原理:http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

另请参阅此Stackoverflow帖子以获取更多信息:Most efficient way to concatenate strings?

示例:

StringBuilder a = new StringBuilder();
a.Append("some text");
a.Append("more text");
string result = a.ToString();

答案 2 :(得分:1)

什么版本的sql server?执行此操作的最佳方法是不使用一个巨大的sql脚本,而是使用table valued parameter或使用sql server批量复制支持。

答案 3 :(得分:1)

最好的办法很可能是同时打开两个文件,随时读取和写入每一行,然后关闭文件。

但是,您最有可能遇到的最大问题是字符串连接。 .NET中的字符串是不可变的,因此每个连接都会导致分配一个新副本,这会占用时间和内存(尽管GC最终会回复后者)。

如果您将textToWrite替换为StringBuilder,并且最后只执行一个ToString(),您会看到 lot 更好的效果。

或者说,老实说,你可以在整个事情上做一个正则表达式替换并完成它,尽管我相信你必须先将整个文件读入内存,因为你已经在做了。

答案 4 :(得分:0)

MemoryMappedFiles效率很高,因此可能值得研究。

string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
using (var mmf = MemoryMappedFile.CreateFromFile(@"e:\temp\query2.sql", FileMode.Create, "txt", new FileInfo(@"e:\temp\query.sql")Length))
{       
    StringBuilder sb = new StringBuilder();
    using (MemoryMappedViewStream mmvs = mmf.CreateViewStream())
    {
       StreamWriter writer = new StreamWriter(mmvs);
       for (int i = 0; i < lines.Length; i++)
       {
          var bits = lines[i].Split('\'');

          var value1 = bits[1];
          var value2 = bits[3];
          var value3 = bits[5];

          sb.AppendFormat("INSERT [PreStaging].[Import_AccountEmployeeMapping]
                          ([AccountName], [EmployeeID], [PlatformID])
                         VALUES (N'{0}', {1}, {2})", value1, value2, value3);


          writer.WriteLine(message.ToString()); 
      }
   }
}

您可能会发现首先构建整个文本,然后将整个文本写入MemoryMappedFiled会更好,因为对ToString的调用较少。