Question

我有一个方法，可以在单词列表中找到第一个匹配项。 wordSet - 我需要检查的一组单词该列表是文本的表示，因此文本具有按顺序排列的单词。所以如果pwWords有吸吮元素{This,is,good,boy,and,this,girl,is,bad} 并且wordSet有{this,is}方法只应为前两个元素添加true。我的问题是：有没有更快的方法来做到这一点？因为如果pwWords有超过一百万个元素，而wordSet超过一万个，那么它的工作速度非常慢。

public List<bool> getFirstOccurances(List<string> pwWords)
    {
        var firstOccurance = new List<bool>();
        var wordSet = new List<String>(WordsWithFDictionary.Keys);
        foreach (var pwWord in pwWords)
        {
            if (wordSet.Contains(pwWord))
            {
                firstOccurance.Add(true);
                wordSet.Remove(pwWord);
            }
            else
            {
                firstOccurance.Add(false);
            }
        }
        return firstOccurance;
    }

Answer 1

另一种方法是将HashSet用于wordSet

public List<bool> getFirstOccurances(List<string> pwWords)
{
    var wordSet = new HashSet<string>(WordsWithFDictionary.Keys);
    return pwWords.Select(word => wordSet.Contains(word)).ToList();
}

HashSet.Contains算法是O（1），其中List.Contains将循环所有项目，直到找到项目。

为了获得更好的性能，如果可能的话，您只能创建一次wordSet。

public class FirstOccurances
{
    private HashSet<string> _wordSet;

    public FirstOccurances(IEnumerable<string> wordKeys)
    {
        _wordSet = new HashSet<string>(wordKeys);
    }

    public List<bool> GetFor(List<string> words)
    {
        return words.Select(word => _wordSet.Contains(word)).ToList();
    }
}

然后使用它

var occurrences = new FirstOccurances(WordsWithFDictionary.Keys);

// Now you can effectively search for occurrences multiple times
var result = occurrences.GetFor(pwWords);
var anotherResult = occurrences.GetFor(anotherPwWords);

因为pwWords的项目可以独立检查出现，如果没有导入项目的顺序，你可以尝试使用Parallel LINQ

public List<bool> GetFor(List<string> words)
{
    return words.AsParallel().Select(word => _wordSet.Contains(word)).ToList();
}

更快地找到列表中第一次出现的String

1 个答案: