计算C#中字符串中的唯一单词

时间:2015-11-30 06:18:49

标签: c#

假设一个这样的字符串。

字符串:我有车。两年前我买了它。我非常喜欢。

我需要从中找出唯一的单词。就像这个字符串中的唯一单词一样    有,a,汽车,有,买,它,两,年等。这些单词只出现在字符串中一次。我用linq试了一下。请看看。

string testingtext="I have a car.I had bought it two years ago.I like it very    much.";
MatchCollection Wordcollection = Regex.Matches(testingtext, @"[\S]+");

          string[] array =     Wordcollection.Cast<Match>().Select(x => x.Value).Distinct().OrderBy(y => y).ToArray();

5 个答案:

答案 0 :(得分:1)

Distinct不能用于此任务。 Distinct只会移除一个字的所有副本;无论如何,你都会得到每一个字,无论它们是否是唯一的。

相反,您需要使用GroupBy。它将创建一个新的键值列表,包含单词和每次出现。

完成后,只需将组中只包含一个值的每个键(即单词在字符串中只出现一次):

    string testingtext = "I have a car I had bought it two years ago I like it very much.";

    IEnumerable<string> allWords = testingtext.Split(' ');
    IEnumerable<string> uniqueWords = allWords.GroupBy(w => w).Where(g => g.Count() == 1).Select(g => g.Key);

如果您想将carcar.视为相同的字词,您可能还需要事先清理输入文字以删除标点符号。

答案 1 :(得分:1)

    public int GetUniqueWordsCount(string txt)
    {
        // Use regular expressions to replace characters
        // that are not letters or numbers with spaces.
        txt = new Regex("[^a-zA-Z0-9]").Replace(txt, " ");

        // Split the text into words.
        var words = txt.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);

        // Use LINQ to get the unique words.
        var wordQuery = words.Distinct();

        return wordQuery.Count();

        //If you want words
        //return word_query.ToArray();
    }

答案 2 :(得分:0)

这可能会解决您的问题

string MyStr = "I have a car.I had bought it two years ago.I like it very much";
var wrodList = MyStr.Split(null);
var output = wrodList.GroupBy(x => x).Select(y => new Word { charchter = y.Key, repeat = y.Count() }).OrderBy(z=>z.repeat);
foreach (var item in output)
{
    textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}

您还需要创建一个类(单词)

public class word
{
    public string  charchter { get; set; }
    public int repeat { get; set; }
}

答案 3 :(得分:0)

string[] wordsArray = testingtext.Replace("."," ").Split(' ');
int carCounter = 0;
int haveCounter = 0;
//...

foreach(String word in wordsArray )
{
if(word.Equals("car"))
  carCounter++;
if(word.Equals("have"))
  haveCounter++;
//...
}
之后,你知道你有多少单词......简单

答案 4 :(得分:0)

不需要 Linq HashSet<String>已足够

  String source = "I have a car I had bought it two years ago I like it very much.";
  //TODO: check this    
  Char[] separators = new Char[] {' ', '\r', '\n', '\t', ',', '.', ';', '!', '?'};

  HashSet<String> uniqueWords = 
    new HashSet<String>(source.Split(separators, StringSplitOptions.RemoveEmptyEntries)), 
    StringComparer.OrdinalIgnoreCase);

  // 13
  Console.Write(uniqueWords.Count);
  ...
  // I, have, a, car, had, bought, it, two, years, ago, like, very, much
  ConsoleWrite(String.Join(", ", uniqueWords));

请注意,此类解决方案仅适用于简单的情况;自然语言中的 word 是一个模糊的概念,所以在NLP(Batural语言处理)的一般情况下,你必须使用一个特殊设计的库。