使用Regex将字频率(计数)作为Linq对象中的属性

时间:2010-12-01 21:46:40

标签: c# regex linq

所以我试图采用this post并根据自己的目的调整它,但我无法弄清楚如何。

这是一个起始查询:

     string input = sb.ToString();
            string[] keywords = new[] { "i","be", "with", "are", "there", "use", "still", "do","out", "so", "will", "but", "if", "can", "your", "what", "just", "from", "all", "get", "about", "this","t", "is","and", "the", "", "a", "to", "http" ,"you","my", "for", "in", "of", "ly" , "com", "it", "on","s", "that", "bit", "at", "have", "m", "rt",  "an", "was", "as", "ll", "not", "me" };
            Regex regex = new Regex("\\w+");
var stuff = regex.Matches(input)
                .OfType<Match>()
                .Select(c => c.Value.ToLowerInvariant())
                .Where(c => !keywords.Contains(c))
                .GroupBy(c => c)
                .OrderByDescending(c => c.Count())
                .ThenBy(c => c.Key);

但我希望能够获得每个Key值的COUNT(频率)以及值本身,以便我可以将它存储在我的数据库中。

foreach (var item in stuff)
            {
                string query = String.Format("INSERT INTO sg_top_words (sg_word, sg_count) VALUES ('{0}','{1}')", item.Key, item.COUNT???);
                cmdIns = new SqlCommand(query, conn);
                cmdIns.CommandType = CommandType.Text;
                cmdIns.ExecuteNonQuery();
                cmdIns.Dispose();
            }

由于

1 个答案:

答案 0 :(得分:3)

假设查询几乎你所追求的,那么这个调整就应该这样做:

var stuff = regex.Matches(input)
    .Cast<Match>() // We're confident everything will be a Match!
    .Select(c => c.Value.ToLowerInvariant())
    .Where(c => !keywords.Contains(c))
    .GroupBy(c => c)
    .Select(g => new { Word = g.Key, Count = g.Count() })
    .OrderByDescending(g => g.Count)
    .ThenBy(g => g.Word);

现在序列将是匿名类型,具有KeyCount属性。

如果你只是将它们插入数据库中,你真的需要订购结果吗?你能用这个吗:

var stuff = regex.Matches(input)
    .Cast<Match>() // We're confident everything will be a Match!
    .Select(c => c.Value.ToLowerInvariant())
    .Where(c => !keywords.Contains(c))
    .GroupBy(c => c)
    .Select(g => new { Word = g.Key, Count = g.Count() });