Question

我有一个List<Keyword>，其中Keyword类是：

public string keyword;
public List<int> ids;
public int hidden;
public int live;
public bool worked;

Keyword有自己的关键字，一组20个ID，默认设置为1，隐藏为0。

我只需要迭代整个主List来使那些相同ID数大于6的关键字无效，所以比较每一对，如果第二个关于第一个重复6个以上，则隐藏的是设为1并活到0。

该算法非常基本，但主列表中包含许多元素时需要很长时间。

我试图猜测是否可以使用任何方法来提高速度。

我使用的基本算法是：

foreach (Keyword main_keyword in lista_de_keywords_live)
{
    if (main_keyword.worked) {
        continue;
    }
    foreach (Keyword keyword_to_compare in lista_de_keywords_live)
    {
        if (keyword_to_compare.worked || keyword_to_compare.id == main_keyword.id) continue;

        n_ids_same = 0;
        foreach (int id in main_keyword.ids)
        {
            if (keyword_to_compare._lista_models.IndexOf(id) >= 0)
            {
                if (++n_ids_same >= 6) break;
            }
        }

        if (n_ids_same >= 6)
        {
            keyword_to_compare.hidden = 1;
            keyword_to_compare.live   = 0;
            keyword_to_compare.worked = true;
        }
    }
}

Answer 1

The code below is an example of how you would use a HashSet for your problem. However, I would not recommend using it in this scenario. On the other hand, the idea of sorting the ids to make the comparison faster still. Run it in a Console Project to try it out.

Notice that once I'm done adding new ids to a keyword, I sort them. This makes the comparison faster later on.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;

namespace KeywordExample
{

    public class Keyword
    {
        public List<int> ids;
        public int hidden;
        public int live;
        public bool worked;

        public Keyword()
        {
            ids = new List<int>();
            hidden = 0;
            live = 1;
            worked = false;
        }

        public override string ToString()
        {
            StringBuilder s = new StringBuilder();
            if (ids.Count > 0)
            {
                s.Append(ids[0]);
                for (int i = 1; i < ids.Count; i++)
                {
                    s.Append(',' + ids[i].ToString());
                }
            }
            return s.ToString();
        }

    }

    public class KeywordComparer : EqualityComparer<Keyword>
    {
        public override bool Equals(Keyword k1, Keyword k2)
        {
            int equals = 0;
            int i = 0;
            int j = 0;

            //based on sorted ids
            while (i < k1.ids.Count && j < k2.ids.Count)
            {
                if (k1.ids[i] < k2.ids[j])
                {
                    i++;
                }
                else if (k1.ids[i] > k2.ids[j])
                {
                    j++;
                }
                else
                {
                    equals++;
                    i++;
                    j++;
                }
            }

            return equals >= 6;
        }
        public override int GetHashCode(Keyword keyword)
        {
            return 0;//notice that using the same hash for all keywords gives you an O(n^2) time complexity though.
        }
    }


    class Program
    {

        static void Main(string[] args)
        {
            List<Keyword> listOfKeywordsLive = new List<Keyword>();
            //add some values
            Random random = new Random();
            int n = 10;
            int sizeOfMaxId = 20;
            for (int i = 0; i < n; i++)
            {
                var newKeyword = new Keyword();
                for (int j = 0; j < 20; j++)
                {
                    newKeyword.ids.Add(random.Next(sizeOfMaxId) + 1);
                }
                newKeyword.ids.Sort(); //sorting the ids
                listOfKeywordsLive.Add(newKeyword);
            }

            //solution here
            HashSet<Keyword> set = new HashSet<Keyword>(new KeywordComparer());
            set.Add(listOfKeywordsLive[0]);
            for (int i = 1; i < listOfKeywordsLive.Count; i++)
            {
                Keyword keywordToCompare = listOfKeywordsLive[i];
                if (!set.Add(keywordToCompare))
                {
                    keywordToCompare.hidden = 1;
                    keywordToCompare.live = 0;
                    keywordToCompare.worked = true;
                }
            }

            //print all keywords to check
            Console.WriteLine(set.Count + "/" + n + " inserted");
            foreach (var keyword in set)
            {
                Console.WriteLine(keyword);
            }

        }

    }
}

Answer 2

效率低下的明显原因是你计算两个（ids）列表的交集方式。算法是O（n ^ 2）。这就是关系数据库为每个连接解决的问题，您的方法将被称为循环连接。主要的有效策略是散列连接和合并连接。对于您的场景，我猜想后一种方法可能会更好，但如果您愿意，也可以尝试使用HashSets。

效率低下的第二个原因是重复所有事情两次。由于（a连接b）等于（b join a），因此整个List<Keyword>不需要两个周期。实际上，你只需要遍历非重复的那些。

使用here中的一些代码，您可以编写如下算法：

Parallel.ForEach(list, k => k.ids.Sort());

List<Keyword> result = new List<Keyword>();

foreach (var k in list)
{
    if (result.Any(r => r.ids.IntersectSorted(k.ids, Comparer<int>.Default)
                             .Skip(5)
                             .Any()))
    {
        k.hidden = 1;
        k.live = 0;
        k.worked = true;
    }
    else
    {
        result.Add(k);
    }
}

如果仅使用索引操作方法替换linq（请参阅上面的链接），我猜它会快一点。

c＃比较ID列表

2 个答案: