我们在使用线程导入数据时遇到了加速过程的问题,但是我们无法弄明白,所以我们把它分解成碎片。
开始时,我们有一个带有人工主键的100万行数据源,我们假设将数据转换为XML并将其插入另一个数据源。想法是拥有更多的线程,每个线程平均分配行并完成工作,并行工作应该加快整个过程,对吧?但它不是那样的,所以我们只关注每个线程分割行,将其转换为XML并将其附加到内存中的某些数据表来计算时间。
以下是我们的工作:
private void btn_Start_Click(object sender, EventArgs e)
{
Thread MyThread = new Thread(Action_Start);
MyThread.Start();
}
void Action_Start()
{
string _Threads = text_Threads.Text; // obtaining amount of threads
string _Bucket = text_Bucket.Text; // obtaining amount of rows to
// process per thread
List<Task> MyTasks = new List<Task>();
for (int Index = 1; Index <= Convert.ToInt32(_Threads); Index++)
{
int MyIndex = Index;
MyTasks.Add(
Task.Factory.StartNew(
() => DoWork(MyIndex, Convert.ToInt32(_Bucket))));
}
Task.WaitAll(MyTasks.ToArray());
}
async void DoWork(int p_Index, int p_Bucket)
{
DataTable MyTable = new DataTable();
for (int Index = 1; Index <= 20; Index++)
{
DataColumn MyColumn = new DataColumn("FIELD_" +
Index.ToString("0000"), typeof(String));
MyTable.Columns.Add(MyColumn);
}
for (int Index = 1; Index <= p_Bucket; Index++)
{
DataRow MyRow = MyTable.NewRow();
for (int Index2 = 1; Index2 <= 20; Index2++)
{
string MyField = "FIELD_" + Index2.ToString("0000");
MyRow[MyField] = new String('0', 128);
}
MyTable.Rows.Add(MyRow);
}
Stopwatch MyTimer = new Stopwatch();
long Brojac = 1;
DataTableReader MyReader = MyTable.CreateDataReader();
MyTimer.Start();
while (await MyReader.ReadAsync())
{
string Result = "<Root>";
for (int Index = 1; Index <= 20; Index++)
{
string MyField = "FIELD_" + Index.ToString("0000");
XElement MyXml = new XElement("Property");
MyXml.SetAttributeValue("Value", MyReader[MyField]);
MyXml.SetAttributeValue("Field", MyField);
Result += MyXml.ToString();
}
Brojac++;
Result += "</Root>";
}
MyTimer.Stop();
MyReader.Close();
TimeSpan ts = MyTimer.Elapsed;
//TIPS_AND_TRICKS: How to format and display the TimeSpan value.
string elapsedTime = String.Format
("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
string Buffer = "Processing time: " +
elapsedTime + "; Through-put: " +
Convert.ToInt32(Brojac / ts.TotalSeconds).ToString() +
" records per second; Total " +
p_Bucket.ToString("000000 ") + " records";
Poruka(Buffer);
}
这里最大的问题是当我们跑步时:
1 thread for 50.000 rows; processing time is 00:00:05.49; On average 9099 records per second;
VS
4 threads for 50.000 rows (12.5k per thread); average processing time per thread is 00:00:20.80; On average 2390 records per second;
我的问题是,一旦我们使用更多线程和任务,为什么平均处理时间和行数会下降? Aren他们认为并行并且只是在几毫秒内吞噬这个数据集?