Question

我有一个Item对象，其中包含一个名为generator_list的属性（字符串的hashset）。我有8000个对象，对于每个对象，我想看看generator_list如何与其他generator_list相交，然后我想将交集号存储在List<int>中在逻辑上，它将有8000个元素。

这个过程大约需要8分钟，但是并行处理只需要几分钟，但我不认为我正在做并行部分，因此问题。任何人都可以告诉我是否以及如何修改我的代码以利用并行循环？

我的Item对象的代码是：

public class Item
{
    public int index { get; set; }
    public HashSet<string> generator_list = new HashSet<string>();
}

我将所有Item对象存储在List<Item> items（8000个元素）中。我创建了一个方法，它接收项目（我要比较的列表）和1项目（我要比较的内容），它就像这样：

public void Relatedness2(List<Item> compare, Item compare_to)
        {
            int compare_to_length = compare_to.generator_list.Count;
            foreach (Item block in compare)
            {
                int block_length = block.generator_list.Count;
                int both = 0; //this counts the intersection number
                if (compare_to_length < block_length) //to make sure I'm looping  
                                                      //over the smaller set
                {
                    foreach (string word in compare_to.generator_list)
                    {
                        if (block.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                else
                {
                    foreach (string word in block.generator_list)
                    {
                        if (compare_to.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                     // I'd like to store the intersection number, both,   
                     // somewhere so I can effectively use parallel loops
            }

        }

最后，我的Parallel forloop是：

Parallel.ForEach(items, (kk, state, index) => Relatedness2(items, kk));

有什么建议吗？

Answer 1

也许是这样的

 public Dictionary<int, int> Relatedness2(IList<Item> compare, Item compare_to)
        {
            int compare_to_length = compare_to.generator_list.Count;
            var intersectionData = new Dictionary<int, int>();
            foreach (Item block in compare)
            {
                int block_length = block.generator_list.Count;
                int both = 0;
                if (compare_to_length < block_length)
                {
                    foreach (string word in compare_to.generator_list)
                    {
                        if (block.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                else
                {
                    foreach (string word in block.generator_list)
                    {
                        if (compare_to.generator_list.Contains(word))
                        {
                            both = both + 1;
                        }
                    }
                }
                intersectionData[block.index] = both;
            }
            return intersectionData;
        }

和

          List<Item> items = new List<Item>(8000);
        //add to list
        var dictionary = new ConcurrentDictionary<int, Dictionary<int, int>>();//thread-safe dictionary

        var readOnlyItems = items.AsReadOnly();// if you sure you wouldn't modify collection, feel free use items directly
        Parallel.ForEach(readOnlyItems, item =>
        {
            dictionary[item.index] = Relatedness2(readOnlyItems, item);
        });

我认为该指数是唯一的。

我使用了词典，但您可能想要使用自己的类在我的示例中，您可以按以下方式访问数据

var intesectiondata = dictionary[1]//dictionary of intersection for item with index 1

var countOfintersectionItemIndex1AndItemIndex3 = dictionary[1][3]
var countOfintersectionItemIndex3AndItemIndex7 = dictionary[3][7]

不要忘记可能性字典[i] == null

Answer 2

线程安全集合可能就是您正在寻找的http://msdn.microsoft.com/en-us/library/dd997305(v=vs.110).aspx。

在多线程环境中工作时，您需要确保这一点   你没有在没有的情况下同时操纵共享数据   同步访问。

.NET Framework提供了一些创建的集合类   专门用于并发环境，这就是你   当你使用多线程时。这些收藏品是   线程安全，这意味着它们在内部使用同步   确保它们可以被多个线程同时访问   时间。

来源：C＃编程考试参考文献70-483，目标1.1：实现多线程和异步处理，使用并发集合

以下是哪些藏品

BlockingCollection<T>
ConcurrentBag<T>
ConcurrentDictionary<T>
ConcurentQueue<T>
ConcurentStack<T>

Answer 3

如果Item的索引是连续的并且从0开始，则根本不需要Item类。只需使用List＆lt;的HashSet＆LT; ＆LT;字符串＆gt;＆gt;，它将为您处理索引。此解决方案在并行LINQ中查找1项与其他项之间的交叉计数。然后它接受并在另一个并行LINQ中的集合的所有项目上运行它。像这样

var items = new List<HashSet<string>>
{
    new HashSet<string> {"1", "2"},
    new HashSet<string> {"2", "3"},
    new HashSet<string> {"3", "4"},
    new HashSet<string>{"1", "4"}
};


var intersects = items.AsParallel().Select(     //Outer loop to run on all items
    item => items.AsParallel().Select(          //Inner loop to calculate intersects
            item2 => item.Intersect(item2).Count())
            //This ToList will create a single List<int>
            //with the intersects for that item
            .ToList() 
        //This ToList will create the final List<List<int>>
        //that contains all intersects.
        ).ToList();

c＃中的并行循环，访问同一个变量

3 个答案: