假设一个这样的字符串。
字符串:我有车。两年前我买了它。我非常喜欢。我需要从中找出唯一的单词。就像这个字符串中的唯一单词一样 有,a,汽车,有,买,它,两,年等。这些单词只出现在字符串中一次。我用linq试了一下。请看看。
string testingtext="I have a car.I had bought it two years ago.I like it very much.";
MatchCollection Wordcollection = Regex.Matches(testingtext, @"[\S]+");
string[] array = Wordcollection.Cast<Match>().Select(x => x.Value).Distinct().OrderBy(y => y).ToArray();
答案 0 :(得分:1)
Distinct
不能用于此任务。 Distinct
只会移除一个字的所有副本;无论如何,你都会得到每一个字,无论它们是否是唯一的。
相反,您需要使用GroupBy
。它将创建一个新的键值列表,包含单词和每次出现。
完成后,只需将组中只包含一个值的每个键(即单词在字符串中只出现一次):
string testingtext = "I have a car I had bought it two years ago I like it very much.";
IEnumerable<string> allWords = testingtext.Split(' ');
IEnumerable<string> uniqueWords = allWords.GroupBy(w => w).Where(g => g.Count() == 1).Select(g => g.Key);
如果您想将car
和car.
视为相同的字词,您可能还需要事先清理输入文字以删除标点符号。
答案 1 :(得分:1)
public int GetUniqueWordsCount(string txt)
{
// Use regular expressions to replace characters
// that are not letters or numbers with spaces.
txt = new Regex("[^a-zA-Z0-9]").Replace(txt, " ");
// Split the text into words.
var words = txt.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
// Use LINQ to get the unique words.
var wordQuery = words.Distinct();
return wordQuery.Count();
//If you want words
//return word_query.ToArray();
}
答案 2 :(得分:0)
这可能会解决您的问题
string MyStr = "I have a car.I had bought it two years ago.I like it very much";
var wrodList = MyStr.Split(null);
var output = wrodList.GroupBy(x => x).Select(y => new Word { charchter = y.Key, repeat = y.Count() }).OrderBy(z=>z.repeat);
foreach (var item in output)
{
textBoxfile.Text += item.charchter +" : "+ item.repeat+Environment.NewLine;
}
您还需要创建一个类(单词)
public class word
{
public string charchter { get; set; }
public int repeat { get; set; }
}
答案 3 :(得分:0)
string[] wordsArray = testingtext.Replace("."," ").Split(' ');
int carCounter = 0;
int haveCounter = 0;
//...
foreach(String word in wordsArray )
{
if(word.Equals("car"))
carCounter++;
if(word.Equals("have"))
haveCounter++;
//...
}
之后,你知道你有多少单词......简单
答案 4 :(得分:0)
不需要 Linq ,HashSet<String>
已足够
String source = "I have a car I had bought it two years ago I like it very much.";
//TODO: check this
Char[] separators = new Char[] {' ', '\r', '\n', '\t', ',', '.', ';', '!', '?'};
HashSet<String> uniqueWords =
new HashSet<String>(source.Split(separators, StringSplitOptions.RemoveEmptyEntries)),
StringComparer.OrdinalIgnoreCase);
// 13
Console.Write(uniqueWords.Count);
...
// I, have, a, car, had, bought, it, two, years, ago, like, very, much
ConsoleWrite(String.Join(", ", uniqueWords));
请注意,此类解决方案仅适用于简单的情况;自然语言中的 word 是一个模糊的概念,所以在NLP(Batural语言处理)的一般情况下,你必须使用一个特殊设计的库。