是否有更优雅/更快的方式来编写下面的代码?目前大约需要45秒。
query.sql长度为200,000行,并且在每一行中都包含SQL:
SELECT N'+dave' AS [AccountName], N'20005' AS [EmployeeID], N'-6' AS [PlatformID] UNION ALL
我发现通过分块到1000块,事情比等到最后并使用WriteAllText(大约需要20分钟才能运行)要快得多。
static void Main(string[] args)
{
var s = new Stopwatch();
s.Start();
string textToWrite = "";
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
int i = 0;
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")";
textToWrite += message + Environment.NewLine;
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:\temp\query2.sql", textToWrite);
textToWrite = "";
}
i++;
}
//File.WriteAllText(@"e:\temp\query2.sql", textToWrite);
File.AppendAllText(@"e:\temp\query2.sql", textToWrite);
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}m", ts.TotalMinutes);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
编辑:StringBuilder解决方案(1000毫秒):
static void Main2(string[] args)
{
var s = new Stopwatch();
s.Start();
var textToWrite = new StringBuilder();
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
int i = 0;
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")"
+ Environment.NewLine;
textToWrite.Append(message);
// Buffering
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
textToWrite = new StringBuilder();
}
i++;
}
File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
编辑:StreamWriter解决方案(450毫秒)
static void Main(string[] args)
{
var s = new Stopwatch();
s.Start();
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
int i = 0;
using (StreamWriter writer = File.AppendText(@"e:\temp\query2.sql"))
{
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
writer.WriteLine("INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N'{0}', {1}, {2})",
value1, value2, value3);
i++;
}
}
s.Stop();
TimeSpan ts = s.Elapsed;
Console.WriteLine("Timespan: {0}ms", ts.TotalMilliseconds);
Console.WriteLine("Total records: " + i);
Console.ReadLine();
}
答案 0 :(得分:4)
正如其他人所指出的那样,使用StringBuilder
。所以在你的情况下,声明:
StringBuilder textToWrite = new StringBuilder();
然后:
textToWrite.AppendLine(message);
if (i % 1000 == 0)
{
Console.WriteLine(i + " " + DateTime.Now.ToLongTimeString());
File.AppendAllText(@"e:\temp\query2.sql", textToWrite.ToString());
textToWrite = new StringBuilder();
}
虽然你可能最好完全放弃缓冲:
using (StreamWriter writer = File.AppendText(filename))
{
// initialization stuff here
foreach (var line in lines)
{
var bits = line.Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
var message = "INSERT [PreStaging].[Import_AccountEmployeeMapping] ([AccountName], [EmployeeID], [PlatformID]) VALUES (N" +
"'" + value1 + "', "
+ value2 + ", "
+ value3 + ")";
writer.WriteLine(message); // write the line
}
}
答案 1 :(得分:2)
一个好的开始是使用.net中的类内置的StringBuilder。这将避免一堆字符串分配和复制。
请参阅MSDN文档,了解其工作原理:http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx
另请参阅此Stackoverflow帖子以获取更多信息:Most efficient way to concatenate strings?
示例:
StringBuilder a = new StringBuilder();
a.Append("some text");
a.Append("more text");
string result = a.ToString();
答案 2 :(得分:1)
什么版本的sql server?执行此操作的最佳方法是不使用一个巨大的sql脚本,而是使用table valued parameter或使用sql server批量复制支持。
答案 3 :(得分:1)
最好的办法很可能是同时打开两个文件,随时读取和写入每一行,然后关闭文件。
但是,您最有可能遇到的最大问题是字符串连接。 .NET中的字符串是不可变的,因此每个连接都会导致分配一个新副本,这会占用时间和内存(尽管GC最终会回复后者)。
如果您将textToWrite
替换为StringBuilder
,并且最后只执行一个ToString()
,您会看到 lot 更好的效果。
或者说,老实说,你可以在整个事情上做一个正则表达式替换并完成它,尽管我相信你必须先将整个文件读入内存,因为你已经在做了。
答案 4 :(得分:0)
MemoryMappedFiles效率很高,因此可能值得研究。
string[] lines = File.ReadAllLines(@"e:\temp\query.sql");
using (var mmf = MemoryMappedFile.CreateFromFile(@"e:\temp\query2.sql", FileMode.Create, "txt", new FileInfo(@"e:\temp\query.sql")Length))
{
StringBuilder sb = new StringBuilder();
using (MemoryMappedViewStream mmvs = mmf.CreateViewStream())
{
StreamWriter writer = new StreamWriter(mmvs);
for (int i = 0; i < lines.Length; i++)
{
var bits = lines[i].Split('\'');
var value1 = bits[1];
var value2 = bits[3];
var value3 = bits[5];
sb.AppendFormat("INSERT [PreStaging].[Import_AccountEmployeeMapping]
([AccountName], [EmployeeID], [PlatformID])
VALUES (N'{0}', {1}, {2})", value1, value2, value3);
writer.WriteLine(message.ToString());
}
}
}
您可能会发现首先构建整个文本,然后将整个文本写入MemoryMappedFiled会更好,因为对ToString
的调用较少。