我有这个功能来从文本中提取所有单词
public static string[] GetSearchWords(string text)
{
string pattern = @"\S+";
Regex re = new Regex(pattern);
MatchCollection matches = re.Matches(text);
string[] words = new string[matches.Count];
for (int i=0; i<matches.Count; i++)
{
words[i] = matches[i].Value;
}
return words;
}
我希望从返回数组中排除单词列表,单词列表看起来像这样
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
如何修改上述功能以避免返回列表中的单词。
答案 0 :(得分:5)
string strWordsToExclude="if,you,me,about,more,but,by,can,could,did";
var ignoredWords = strWordsToExclude.Split(',');
return words.Except(ignoredWords).ToArray();
我认为Except
方法符合您的需求
答案 1 :(得分:2)
如果你没有被迫使用Regex,你可以使用一点LINQ:
void Main()
{
var wordsToExclude = "if,you,me,about,more,but,by,can,could,did".Split(',');
string str = "if you read about cooking you can cook";
var newWords = GetSearchWords(str, wordsToExclude); // read, cooking, cook
}
string[] GetSearchWords(string text, IEnumerable<string> toExclude)
{
var words = text.Split();
return words.Where(word => !toExclude.Contains(word)).ToArray();
}
我假设一个单词是一系列非空白字符。