Question

我正在创建一个Word Cloud，因此我使用正则表达式在Linq中分割我的句子并对单词进行分组并计算它们。但是，我不希望在我的云中出现一些黑名单单词，所以我将这些单词放在数据表（dtBlackList）中并使用Linq进行检查，如下面的代码所示

var result = (Regex.Split(StringsForWordCloud, @"\W+")
                   .GroupBy(s => s, StringComparer.InvariantCultureIgnoreCase)
                   .Where(q => q.Key.Trim() != "")
                   .Where(q => (dtBlackList.Select("blacklistword = '" + q.Key.Trim() + "'").Count() == 0))
                   .OrderByDescending(g => g.Count())
                   .Select(p => new { Word = p.Key, Count = p.Count() })
              ).Take(200);

此查询会严重影响我的表现吗？这是检查数据表的正确方法吗？

Answer 1

LINQ查询，因为这个查询将对Regex.Split操作找到的每个单词执行查询。我指的是这行代码：

.Where(q => (dtBlackList.Select("blacklistword = '" + q.Key.Trim() + "'").Count() == 0))

我必须处理我现在正在工作的项目中的很多性能问题，这是由与此类似的情况引起的。

通常，执行查询以检查或完成数据库中提取的数据不是一种好习惯。

在您的情况下，我认为编写单个查询会更好地提取黑名单单词，然后从刚刚提取的数据集中排除该列表。如下：

var words = Regex.Split(StringsForWordCloud, @"\W+")
    .GroupBy(s => s, StringComparer.InvariantCultureIgnoreCase)
    .Where(q => q.Key.Trim() != "")
    .OrderByDescending(g => g.Count())
    .Select(p => new { Word = p.Key, Count = p.Count() });

// Now extract all the word in the blacklist
IEnumerable<string> blackList = dtBlackList...

// Now exclude them from the set of words all in once
var result = words.Where(w => !blackList.Contains(w.Word)
    .OrderByDescending(g => g.Count())
    .Take(200);

使用Linq检查数据表中的值

1 个答案: