Question

我们在使用线程导入数据时遇到了加速过程的问题，但是我们无法弄明白，所以我们把它分解成碎片。

开始时，我们有一个带有人工主键的100万行数据源，我们假设将数据转换为XML并将其插入另一个数据源。想法是拥有更多的线程，每个线程平均分配行并完成工作，并行工作应该加快整个过程，对吧？但它不是那样的，所以我们只关注每个线程分割行，将其转换为XML并将其附加到内存中的某些数据表来计算时间。

以下是我们的工作：

private void btn_Start_Click(object sender, EventArgs e)
{
    Thread MyThread = new Thread(Action_Start);
    MyThread.Start();
}

void Action_Start()
{
    string _Threads = text_Threads.Text;    // obtaining amount of threads 
    string _Bucket = text_Bucket.Text;      // obtaining amount of rows to 
                                            //          process per thread

    List<Task> MyTasks = new List<Task>();

    for (int Index = 1; Index <= Convert.ToInt32(_Threads); Index++)
    {
        int MyIndex = Index;
        MyTasks.Add(
            Task.Factory.StartNew(
                () => DoWork(MyIndex, Convert.ToInt32(_Bucket))));
    }

    Task.WaitAll(MyTasks.ToArray());
}

async void DoWork(int p_Index, int p_Bucket)
{
    DataTable MyTable = new DataTable();

    for (int Index = 1; Index <= 20; Index++)
    {
        DataColumn MyColumn = new DataColumn("FIELD_" + 
                                  Index.ToString("0000"), typeof(String));
        MyTable.Columns.Add(MyColumn);
    }

    for (int Index = 1; Index <= p_Bucket; Index++)
    {
        DataRow MyRow = MyTable.NewRow();

        for (int Index2 = 1; Index2 <= 20; Index2++)
        {
            string MyField = "FIELD_" + Index2.ToString("0000");
            MyRow[MyField] = new String('0', 128);
        }
        MyTable.Rows.Add(MyRow);
    }

    Stopwatch MyTimer = new Stopwatch();
    long Brojac = 1;
    DataTableReader MyReader = MyTable.CreateDataReader();

    MyTimer.Start();

    while (await MyReader.ReadAsync())
    {
        string Result = "<Root>";
        for (int Index = 1; Index <= 20; Index++)
        {
            string MyField = "FIELD_" + Index.ToString("0000");
            XElement MyXml = new XElement("Property");
            MyXml.SetAttributeValue("Value", MyReader[MyField]);
            MyXml.SetAttributeValue("Field", MyField);
            Result += MyXml.ToString();
        }
        Brojac++;
        Result += "</Root>";
    }

    MyTimer.Stop();
    MyReader.Close();

    TimeSpan ts = MyTimer.Elapsed;

    //TIPS_AND_TRICKS: How to format and display the TimeSpan value.
    string elapsedTime = String.Format
                         ("{0:00}:{1:00}:{2:00}.{3:00}", 
                          ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10);

    string Buffer = "Processing time: " + 
                     elapsedTime + "; Through-put: " + 
                     Convert.ToInt32(Brojac / ts.TotalSeconds).ToString() + 
                     " records per second; Total " +
                     p_Bucket.ToString("000000 ") + " records";
    Poruka(Buffer);
}

这里最大的问题是当我们跑步时：

1 thread for 50.000 rows; processing time is 00:00:05.49; On average 9099 records per second;

VS

4 threads for 50.000 rows (12.5k per thread); average processing time per thread is 00:00:20.80; On average 2390 records per second;

我的问题是，一旦我们使用更多线程和任务，为什么平均处理时间和行数会下降？ Aren他们认为并行并且只是在几毫秒内吞噬这个数据集？

Answer 1

如评论中所述，您的应用程序的主要瓶颈是垃圾收集器，因为您正在进行大量分配。

使用dotTrace进行性能分析会话显示在GC中花费的时间占30％。那是巨大的：

分配的主要来源是在调用MyXml.ToString()时分配的XmlWriter。如果需要考虑性能，则应考虑其他方法来生成XML。例如，用string.Format替换该东西显示我的机器性能提高了5倍。

线程C＃慢得多于预期

1 个答案: