我需要使用C#在文本中显示最常用的单词。我正在使用WinForm,VS2012。
以下代码有效但显示“我喜欢苹果”。
我可以一字一句地打破,以使其显示“苹果”,但那效率不高......
我是编程的新手,所以编码简单(必须在C#中)会很棒:)
提前谢谢大家〜
string[] source = { "I like apples.", "I like red apples.",
"I like red apples than green apples." };
var frequencies = new Dictionary<string, int>();
string highestWord = null;
int highestFreq = 0;
foreach (string word in source)
{
int freq;
frequencies.TryGetValue(word, out freq);
freq += 1;
if (freq > highestFreq)
{
highestFreq = freq;
highestWord = word;
}
frequencies[word] = freq;
}
this.lblFreqWords.Text = highestWord;
答案 0 :(得分:2)
这是因为这一行实际上是遍历每个句子而不是每个单词:
foreach (string word in source) // source is a collection of sentences
如果不重写整个程序,从当前集合中获取单个单词的最快方法可能是:
string.join
),然后试试这个:
var words = string.Join(" ", source).Split(new[] {' ', '.'});
foreach (var word in words)
{
...
}
答案 1 :(得分:1)
试试这个
string[] source = { "I like apples.", "I like red apples.",
"I like red apples than green apples." };
var frequencies = new Dictionary<string, int>();
string highestWord = null;
int highestFreq = 0;
var message = string.Join(" ", source);
var splichar = new char[] { ' ', '.' };
var single = message.Split(splichar);
foreach (var item in single)
{
int freq;
frequencies.TryGetValue(item, out freq);
freq += 1;
if (freq > highestFreq)
{
highestFreq = freq;
highestWord = item.Trim();
}
frequencies[item] = freq;
}
this.lblFreqWords.Text = highestWord;
答案 2 :(得分:1)
我可能会使用LINQ。以下行将返回一个有序的IEnumerable<KeyValuePair<string, int>>
,它(理论上)代表每个单词及其出现次数。您需要为“特殊字符”添加更多案例,例如标点符号。但这是一个好的开始。
char[] wordBreaks = new[] { ' ', '.', ',', '\'' };
return source.SelectMany(c => c.Split(wordBreaks))
.GroupBy(c => c)
.Select(c => new KeyValuePair<string, int>(c.Key, c.Count()))
.OrderByDescending(c => c.Value);
当然,一旦你有了这个,你可以抓住thatValue.First().Key
找到最常用的词。
答案 3 :(得分:1)
Grant Winney's answer进入为什么你的程序不起作用,但有一种更好的方法可以拆分单词然后只拆分空格和句点。正则表达式具有代表“单词边界”的符号\b
,它还具有\w
,其可以表示任何字母a-z,0-9和下划线。因此,如果您使用模式\b\w+\b
,这意味着“单词边界后跟一个或多个字母数字字符后跟一个单词边界”。
string[] source = { "I like apples.", "I like red apples.",
"I like red apples than green apples.",
"red red red apples, Yum!" };
var frequencies = new Dictionary<string, int>();
int highestFreq = 0;
var combinedString = string.Join(" ", source);
var matches = Regex.Matches(combinedString, @"\b\w+\b");
foreach (Match match in matches)
{
var word = match.Value;
int freq;
frequencies.TryGetValue(word, out freq);
freq += 1;
if (freq > highestFreq)
{
highestFreq = freq;
}
frequencies[word] = freq;
}
//This will hold a list of all the words that match
var highestWords = frequencies.Where(x=>x.Value == highestFreq).Select(x=>x.Key).ToList();
Console.WriteLine("Highest freq: {0}", highestFreq);
foreach(var word in highestWords)
{
Console.WriteLine(word);
}
这将删除你句子中的.
。如果您希望夸大的单词显示为一个单词而不是两个单词,则需要将模式更改为\b[\w-]+\b