Question

我有一组有限的消费者线程，每个消费线程都在消耗一份工作。一旦他们处理了这份工作，就会有一份在消费工作中列出的子工作清单。我需要添加该列表中的subjobs，我还没有在数据库中。数据库中有300万，因此获取数据库中尚未包含的列表的速度很慢。我不介意每个线程在该调用上阻塞，但由于我有一个竞争条件（参见代码），我必须将它们全部锁定在该慢速调用上，因此它们只能一次调用该部分并且我的程序会爬行。我该怎么做才能解决这个问题，以便线程不会因为该调用而减慢速度？我尝试了一个队列，但由于线程推出的作业列表比计算机可以确定应该将哪些作业添加到数据库中更快，我最终会得到一个不断增长的队列，并且永远不会清空。

我的代码：

IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID)
{
    return subJobs.Select(el => el.name)
        .Except(db.jobs.Where(el => el.set_ID==setID).Select(el => el.name));
}

//...consumer thread i
lock(lockObj)
{
    var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID);
    //if there was a context switch here to some thread i+1
    //   and that thread found uniqueJobs that also were found in thread i
    //   then there will be multiple copies of the same job added in the database.
    //   So I put this section in a lock to prevent that.
    saveJobsToDatabase(uniqueJobName, consumerSetID);
}
//continue consumer thread i...

Answer 1

不是回到数据库来检查作业名称的唯一性，而是将相关信息放入内存中的查找数据结构中，这样可以更快地检查存在：

Dictionary<int, HashSet<string>> jobLookup = db.jobs.GroupBy(i => i.set_ID)
    .ToDictionary(i => i.Key, i => new HashSet<string>(i.Select(i => i.Name)));

这只做一次。之后，每次需要检查唯一性时，都使用查找：

IEnumerable<string> getUniqueJobNames(IEnumerable<job> subJobs, int setID)
{
    var existingJobs = jobLookup.ContainsKey(setID) ? jobLookup[setID] : new HashSet<string>();

    return subJobs.Select(el => el.Name)
        .Except(existingJobs);
}

如果您需要输入新的子作业，请将其添加到查找中：

lock(lockObj)
{
    var uniqueJobNames = getUniqueJobNames(consumedJob.subJobs, consumerSetID);
    //if there was a context switch here to some thread i+1
    //   and that thread found uniqueJobs that also were found in thread i
    //   then there will be multiple copies of the same job added in the database.
    //   So I put this section in a lock to prevent that.
    saveJobsToDatabase(uniqueJobName, consumerSetID);

    if(!jobLookup.ContainsKey(newconsumerSetID))
    {
        jobLookup.Add(newconsumerSetID, new HashSet<string>(uniqueJobNames));
    }
    else
    {
        jobLookup[newconsumerSetID] = new HashSet<string>(jobLookup[newconsumerSetID].Concat(uniqueJobNames)));
    }
}

多线程集差异的有效方法

1 个答案: