Question

我需要使用C＃在文本中显示最常用的单词。我正在使用WinForm，VS2012。

以下代码有效但显示“我喜欢苹果”。

我可以一字一句地打破，以使其显示“苹果”，但那效率不高......

我是编程的新手，所以编码简单（必须在C＃中）会很棒:)

提前谢谢大家〜

string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples." };

            var frequencies = new Dictionary<string, int>();
            string highestWord = null;
            int highestFreq = 0;

            foreach (string word in source)
            {
                int freq;
                frequencies.TryGetValue(word, out freq);
                freq += 1;

                if (freq > highestFreq)
                {
                    highestFreq = freq;
                    highestWord = word;
                }
                frequencies[word] = freq;
            }

            this.lblFreqWords.Text = highestWord;

Answer 1

这是因为这一行实际上是遍历每个句子而不是每个单词：

foreach (string word in source)  // source is a collection of sentences

如果不重写整个程序，从当前集合中获取单个单词的最快方法可能是：

将所有句子拼合成一个长句（使用string.join），然后
通过＆＃34; space＆＃34;拆分获得单个词语：（并通过＆＃34;。＆＃34;将其排除在外）

试试这个：

var words = string.Join(" ", source).Split(new[] {' ', '.'});

foreach (var word in words)
{
    ...
}

Answer 2

试试这个

 string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples." };

            var frequencies = new Dictionary<string, int>();
            string highestWord = null;
            int highestFreq = 0;

            var message = string.Join(" ", source);
            var splichar = new char[] { ' ', '.' };
            var single = message.Split(splichar);
            foreach (var item in single)
            {
                int freq;
                frequencies.TryGetValue(item, out freq);
                freq += 1;

                if (freq > highestFreq)
                {
                    highestFreq = freq;
                    highestWord = item.Trim();
                }
                frequencies[item] = freq;
            }




            this.lblFreqWords.Text = highestWord;

Answer 3

我可能会使用LINQ。以下行将返回一个有序的IEnumerable<KeyValuePair<string, int>>，它（理论上）代表每个单词及其出现次数。您需要为“特殊字符”添加更多案例，例如标点符号。但这是一个好的开始。

char[] wordBreaks = new[] { ' ', '.', ',', '\'' };

return source.SelectMany(c => c.Split(wordBreaks))
             .GroupBy(c => c)
             .Select(c => new KeyValuePair<string, int>(c.Key, c.Count()))
             .OrderByDescending(c => c.Value);

当然，一旦你有了这个，你可以抓住thatValue.First().Key找到最常用的词。

Answer 4

Grant Winney's answer进入为什么你的程序不起作用，但有一种更好的方法可以拆分单词然后只拆分空格和句点。正则表达式具有代表“单词边界”的符号\b，它还具有\w，其可以表示任何字母a-z，0-9和下划线。因此，如果您使用模式\b\w+\b，这意味着“单词边界后跟一个或多个字母数字字符后跟一个单词边界”。

    string[] source = { "I like apples.", "I like red apples.", 
                             "I like red apples than green apples.", 
                             "red red red apples, Yum!" };

    var frequencies = new Dictionary<string, int>();
    int highestFreq = 0;

    var combinedString = string.Join(" ", source);
    var matches = Regex.Matches(combinedString, @"\b\w+\b");
    foreach (Match match in matches)
    {
        var word = match.Value;

        int freq;
        frequencies.TryGetValue(word, out freq);
        freq += 1;

        if (freq > highestFreq)
        {
            highestFreq = freq;
        }
        frequencies[word] = freq;
    }
    //This will hold a list of all the words that match 
    var highestWords = frequencies.Where(x=>x.Value == highestFreq).Select(x=>x.Key).ToList();

    Console.WriteLine("Highest freq: {0}", highestFreq);
    foreach(var word in highestWords)
    {
        Console.WriteLine(word);
    }

Run Code

这将删除你句子中的.。如果您希望夸大的单词显示为一个单词而不是两个单词，则需要将模式更改为\b[\w-]+\b

C＃，WinForm中文本中最常用的单词

4 个答案: