迭代数千元素列表

时间:2016-05-29 14:10:39

标签: c# list for-loop

case 15: {
    for (int i = 0; i < words.Count; i++) {
        if (words[i].Length == 8) {
            var tupled = words[i].ConcatCheck();
            for (int n = 0; n < word.Count; n++)
                if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
                    temp++;
        }
        if (temp >= 2)
            matches.Add(words[i]);
        temp = 0;
    }
    break;
}

它的作用:
第一个'for循环'遍历大约248000个元素的List个单词,检查长度为8的单词。 当找到一个时,我通过调用Tuple方法(我为obj String编写的扩展方法)创建单词的前半部分和后半部分ConcatCheck()(每半个4个字母)。那部分既快又好。

真正需要的是第二个'for循环'。每个单个8个字母的单词激活此循环,循环遍历大约267000个元素的更大List,检查Tuple的两个项是否都存在。如果两者都找到,我将原始单词添加到列表“匹配”。

这部分需要将近3分钟才能找到我所拥有的248k词典中的所有匹配项。有什么方法可以优化/加速它?

2 个答案:

答案 0 :(得分:2)

如果您只是想检查某个集合中是否存在某个字词,请使用HashSet代替ListArrayHashSet类针对Contains检查进行了优化。

示例

使用以下代码,我发现所有8个字母的单词由english dictionary (github version)中的两个4个字母单词组成,不到50毫秒

WebClient client = new WebClient();
string dictionary = client.DownloadString(
    @"https://raw.githubusercontent.com/dwyl/english-words/master/words.txt");

Stopwatch watch = new Stopwatch();
watch.Start();

HashSet<string> fourLetterWords = new HashSet<string>();

using (StringReader reader = new StringReader(dictionary))
{
    while (true)
    {
        string line = reader.ReadLine();
        if (line == null) break;
        if (line.Length != 4) continue;

        fourLetterWords.Add(line);
    }
}

List<string> matches = new List<string>();

using (StringReader reader = new StringReader(dictionary))
{
    while (true)
    {
        string line = reader.ReadLine();
        if (line == null) break;
        if (line.Length != 8) continue;

        if (fourLetterWords.Contains(line.Substring(0, 4)) &&
            fourLetterWords.Contains(line.Substring(4, 4)))
            matches.Add(line);
    }
}

watch.Stop();    

为什么你的代码这么慢?

for (int n = 0; n < word.Count; n++)
    if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
        temp++;

这部分是罪魁祸首之一。而不是检查Are both parts contained in my array?,而是检查Are 2 or more of my 2 words contained in an array?

一旦找到两个单词,你可以通过打破循环来优化这个部分。

if (word[n] == tupled.Item1 || word[n] == tupled.Item2)
    if(++temp >= 2) break;         

可以通过按长度或按字母顺序对单词进行预先排序来进一步优化(取决于您运行此搜索的频率)。

答案 1 :(得分:-1)

O(n)使用字典:

            IList<string> words1 = new List<string>{...};
            var wordsWithLengthOf8 = words1.Where(w => w.Length == 8).ToList();
            IDictionary<string,string> wordsWithLengthOf8Dic = wordsWithLengthOf8.ToDictionary(w => w);
            IList<string> words2 = new List<string>{...};
            IList<string> matches = new List<string>();   

            for (int i = 0; i < wordsWithLengthOf8.Count; i++)
            {
                var tupled = wordsWithLengthOf8[i].ConcatCheck();
                var isMatch = wordsWithLengthOf8Dic.ContainsKey(tupled.Item1) && wordsWithLengthOf8Dic.ContainsKey(tupled.Item2);
                if (isMatch)
                {
                    matches.Add(wordsWithLengthOf8[i]);
                }
            }