Sequential vs parallel solution memory usage

时间:2015-06-26 09:56:01

标签: c# .net oracle memory-leaks task

I have a slight issue with the following scenario: I'm given a list of ID values, I need to run a SELECT query (where the ID is a parameter), then combine all the result sets as one big one and return it to the caller.

Since the query might run for minutes per ID (that's another issue, but at the moment I consider it as a given fact), and there can be 1000s of IDs in the input) I tried to use tasks. With that approach I experience a slow, but solid increase in memory use.

As a test, I made a simple sequential solution too, this has normal memory usage graph, but as expected, very slow. There's an increase while it's running, but then everything drops back to the normal level when it's finished.

Here's the skeleton of code:

public class RowItem
{
    public int ID { get; set; }
    public string Name { get; set; }
    //the rest of the properties
}


public List<RowItem> GetRowItems(List<int> customerIDs)
{
    // this solution has the memory leak
    var tasks = new List<Task<List<RowItem>>>();
    foreach (var customerID in customerIDs)
    {
        var task = Task.Factory.StartNew(() => return ProcessCustomerID(customerID));
        tasks.Add(task);
    }

    while (tasks.Any())
    {
        var index = Task.WaitAny(tasks.ToArray());
        var task = tasks[index];
        rowItems.AddRange(task.Result);
        tasks.RemoveAt(index);
    }

    // this works fine, but slow
    foreach (var customerID in customerIDs)
    {
        rowItems.AddRange(ProcessCustomerID(customerID)));
    }

    return rowItems;
}

private List<RowItem> ProcessCustomerID(int customerID)
{
    var rowItems = new List<RowItem>();
    using (var conn = new OracleConnection("XXX"))
    {
        conn.Open();
        var sql = "SELECT * FROM ...";
        using (var command = new OracleCommand(sql, conn))
        {
            using (var dataReader = command.ExecuteReader())
            {
                using (var dataTable = new DataTable())
                {
                    dataTable.Load(dataReader);
                    rowItems = dataTable
                               .Rows
                               .OfType<DataRow>()
                               .Select(
                                   row => new RowItem
                                   {
                                       ID = Convert.ToInt32(row["ID"]),
                                       Name = row["Name"].ToString(),
                                       //the rest of the properties
                                   })
                               .ToList();
                }
            }
        }
        conn.Close();
    }
    return rowItems;
}

What am I doing wrong when using tasks? According to this MSDN article, I don't need to bother disposing them manually, but there's barely anything else. I guess ProcessCustomerID is OK, as it's called in both variations.

update To log the current memory usage I used Process.GetCurrentProcess().PrivateMemorySize64, but I noticed the problem in Task Manager >> Processes

2 个答案:

答案 0 :(得分:0)

Using entity framework your ProcessCustomerID method could look like:

List<RowItem> rowItems;
using(var ctx = new OracleEntities()){
  rowItems = ctx.Customer
    .Where(o => o.id == customerID)
    .Select(
      new RowItem
      {
        ID = Convert.ToInt32(row["ID"]),
        Name = row["Name"].ToString(),
        //the rest of the properties
      }
    ).ToList();
}
return rowItems;

Unless you are transferring large amounts of data like images, video, data or blobs this should be near instantaneous with 1k data as result.

If it is unclear what is taking time, and you use pre 10g oracle, it would be really hard to monitor this. However if you use entity framework you can attach monitoring to it! http://www.hibernatingrhinos.com/products/efprof

At least a year ago Oracle supported entity framework 5.

In sequential they are executed one by one, in parallel they literally get started all at same time consuming your resources and creating deadlocks.

答案 1 :(得分:0)

我认为您没有任何证据证明并行执行中存在内存泄漏。

可能是垃圾收集发生在不同的时间,这就是为什么经历两个不同的读数。你不能指望它实时释放内存。 .Net垃圾收集仅在需要时发生。看看“Fundamentals of Garbage Collection

任务管理器或Process.GetCurrentProcess().PrivateMemorySize64可能不是非常准确的方法来查找内存泄漏。如果这样做,至少要确保在读取内存计数器之前调用完全垃圾收集并等待挂起的终结器。

GC.Collect();
GC.WaitForPendingFinalizers();