我使用以下代码从字符串输入中提取单词,我怎样才能得到每个单词的出现?
var words = Regex.Split(input, @"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10)
.Select(g => g.Key);
答案 0 :(得分:4)
您可以使用Regex.Split
而不是string.Split
来获取每个单词的计数,如:
string str = "Some string with Some string repeated";
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
如果您要过滤掉至少重复10次的单词,那么您可以在Select
之前添加条件Where(grp=> grp.Count >= 10)
输出:
foreach (var item in result)
{
Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}
输出:
Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1
对于不区分大小写的分组,您可以将当前的GroupBy替换为:
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
所以你的查询是:
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
.Where(grp => grp.Count() >= 10)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
答案 1 :(得分:2)
试试这个:
var words = Regex.Split(input, @"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Select(g => new {key = g.Key, count = g.Count()});
答案 2 :(得分:0)
删除Select
语句以保留IGrouping
,您可以使用var words = Regex.Split(input, @"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10);
foreach (var wordGrouping in words)
{
var word = wordGrouping.Key;
var count = wordGrouping.Count();
}
查看这两个键并计算值。
{{1}}
答案 3 :(得分:0)
你可以制作这样的字典:
var words = Regex.Split(input, @"\W+")
.GroupBy(w => w)
.Select(g => g.Count() > 10)
.ToDictionary(g => g.Key, g => g.Count());
或者,如果您想避免计算两次计数,请执行以下操作:
var words = Regex.Split(input, @"\W+")
.GroupBy(w => w)
.Select(g => new { g.Key, Count = g.Count() })
.Where(g => g.Count > 10)
.ToDictionary(g => g.Key, g => g.Count);
现在你可以得到这样的单词数(假设单词" foo"在input
中出现超过10次):
var fooCount = words["foo"];