Parralel.ForEach丢失数据

时间:2018-05-22 14:12:52

标签: c# task-parallel-library

我的代码运行得太慢了:

DataTable result = GetDataTable();
while (reader.Read())
{
    var a = reader.Field<int>("a").ToString();
    var b = reader.Field<int>("b").ToString();
    var c = reader.Field<double>("c");
    var d = reader.Field<string>("d");
    var e = reader.Field<string>("e");

    DataRow datarow = result.AsEnumerable().FirstOrDefault(r => r.Field<string>("A") == a && r.Field<string>("D") == d);

    if (datarow == null)
    {
        datarow = result.NewRow();
        datarow["A"] = a;
        datarow["D"] = d;
        datarow["E"] = e;
        result.Rows.Add(datarow);
    }
    datarow[b] = c;
}
return result;

我将其更改为使用TPL。现在看起来像:

var result = GetDataTable();
var concurrentCollection = new ConcurrentDictionary<string, SomeClass>();

Parallel.ForEach(reader.ToDataTable().AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = 2 }, row =>
{
    var a = reader.Field<int>("a").ToString();
    var b = reader.Field<int>("b").ToString();
    var c = reader.Field<double>("c");
    var d = reader.Field<string>("d");

    var values = concurrentCollection.FirstOrDefault(r => r.Key.ToString() == $"{a}|{d}");

    if (values.Key == null)
    {
        var data = new SomeClass
        {
            Dictionary =
            {
                ["A"] = a;
                ["D"] = d;
                ["E"] = reader.Field<string>("e")
            }
        };
        values = new KeyValuePair<string, SomeClass>($"{a}|{d}", data);
    }
    values.Value.Dictionary[b] = c;

    concurrentCollection.AddOrUpdate(values.Key, values.Value, (key, oldValue) => values.Value);
});

foreach (var ins in concurrentCollection.OrderBy(x => x.Value.Dictionary["D"]).ThenBy(x => x.Value.Dictionary["A"]))
{
    var datarow = result.NewRow();
    foreach (var key in ins.Value.Dictionary.Keys)
    {
        datarow[key.ToString()] = ins.Value.Dictionary[key];
    }
    result.Rows.Add(datarow);
}
concurrentCollection.Clear();
return result;

如果我将MaxDegreeOfParallelism更改为1,则两个代码部分的结果都相同。但是当我更改MaxDegreeOfParallelism的值时,结果数据开始变化。 MaxDegreeOfParallelism的值越大,结果就越不同。

以下是结果变量的JSON转换结果。

第一个代码部分的部分结果:

  

[{       “A”:“1010”,       “1”:“744”,       “2”:“736”,       “3”:“8”,       “4”:null,       “5”:null,       “6”:null,       “7”:null,       “8”:null,       “9”:null,       “10”:null,       “B”:“数据”,       “E”:“0.4”   },...]

第二个代码部分的部分结果:

  

[{       “A”:“1010”,       “1”:“744”,       “2”:null ,       “3”:null ,       “4”:null,       “5”:null,       “6”:null,       “7”:null,       “8”:null,       “9”:null,       “10”:null,       “B”:“数据”,       “E”:“0.4”   },...]

每个开始结果JSON数组中不匹配对象的数量不同。

2 个答案:

答案 0 :(得分:3)

也许你会以错误的方式解决这个问题。我假设慢速部分正在查找result中的匹配行。尝试使用您需要查找的字段的键创建字典。使用字典将接近O(1)查找 如果字段A和D在result中不是唯一的,请使用查找。 ToLookup()并从查找键的结果中选择第一行(等于你今天的逻辑)

DataTable result = GetDataTable();
var dic  = result.AsEnumerable().ToDictionary(r => new { A = r.Field<string>("A"), D = r.Field<string>("D")});
while (reader.Read())
{
    var a = reader.Field<int>("a").ToString();
    var b = reader.Field<int>("b").ToString();
    var c = reader.Field<double>("c");
    var d = reader.Field<string>("d");
    var e = reader.Field<string>("e");

    DataRow datarow;
    if(!dic.TryGetValue(new{A = a, D = d}, out datarow))
    {
        datarow = result.NewRow();
        datarow["A"] = a;
        datarow["D"] = d;
        datarow["E"] = e;
        result.Rows.Add(datarow);
        dic.Add(new{A = a, D = d}, datarow);
    }
    datarow[b] = c;
}
return result;

答案 1 :(得分:0)

我会使用一个覆盖equals和GetHashCode的类 使用HashSet进行O(1)查找

require_once('../db_wamp_pdo.php');
$tablename = 'xxx';

//hard coded list for testing
$brandList_og = array('Zyflo CR Extended-release tablet 600 mg','Zyflo CR Extended-release tablet 600 mg',' SEE NOTES BELOW', 'Alvesco HFA 80mcg');

function get_displayList($tablename, $conn, $brandlist){    
    $rowcount = 0;
    if($brandlist != '' && count($brandlist) > 0){
        $qMarks = str_repeat('?,', count($brandlist) - 1) . '?'; //create '?' mark placeholders for query, remove last comma and replace with '?'       

        //$displayList_sql = "SELECT * FROM $tablename WHERE CONCAT(brandname, ' ', dosage) IN('Zyflo CR Extended-release tablet 600 mg','Zyflo CR Extended-release tablet 600 mg',' SEE NOTES BELOW', 'Alvesco HFA 80mcg') ORDER BY FIELD(CONCAT(brandname, ' ', dosage),'Zyflo CR Extended-release tablet 600 mg','Zyflo CR Extended-release tablet 600 mg',' SEE NOTES BELOW', 'Alvesco HFA 80mcg')";    

        $displayList_sql = "SELECT * FROM $tablename WHERE CONCAT(brandname, ' ', dosage) IN($qMarks) ORDER BY FIELD(CONCAT(brandname, ' ', dosage),'". trim(implode("','", $brandlist))."')";  

        //$displayList_sql = "SELECT * FROM $tablename WHERE CONCAT(brandname, ' ', dosage) FIND_IN_SET($qMarks) ORDER BY FIELD(CONCAT(brandname, ' ', dosage),'". trim(implode("','", $brandlist))."')";   
        //$displayList_sql = "SELECT * FROM $tablename WHERE EXISTS(SELECT CONCAT(brandname, ' ', dosage)) ORDER BY FIELD(CONCAT(brandname, ' ', dosage),'". trim(implode("','", $brandlist))."')";             

        $displayList_stmt = $conn->prepare($displayList_sql);
        $displayList_stmt->execute($brandlist);//make note of passing in array as param to execute() call
        $displayList_stmt->setFetchMode(PDO::FETCH_ASSOC);
        $_displayList = $displayList_stmt->fetchAll(); //returns multi-dimensional array (and correct count)
        $colcount = $displayList_stmt->columnCount();
        $rowcount = $displayList_stmt->rowCount(); 
    }   

    if($rowcount <= 0){
        //nothing returned
    }else{  
        return $_displayList;
    }       
}
$my_displayList = get_displayList($tablename, $conn, array_values(array_filter($brandList_og)));//array_filter() added to get count of only non empty indexes

echo 'BRAND LIST: <br>';
echo var_dump($my_displayList);
echo '<br>';