我需要根据一组单词从字符串中删除单词:
我要删除的字词:
DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND
如果我收到如下字符串:
编辑:此字符串已经"已清理"来自任何符号
THIS IS AN AMAZING WEBSITE AND LAYOUT
结果应为:
THIS IS AMAZING WEBSITE LAYOUT
到目前为止,我有:
public static string StringWordsRemove(string stringToClean, string wordsToRemove)
{
string[] splitWords = wordsToRemove.Split(new Char[] { ' ' });
string pattern = "";
foreach (string word in splitWords)
{
pattern = @"\b" + word + "\b";
stringToClean = Regex.Replace(stringToClean, pattern, "");
}
return stringToClean;
}
但它没有删除这些词,任何想法?
我不知道我是否使用最有效的方法来执行此操作,或者将这些文字放在一个数组中以避免一直将它们分开?
由于
答案 0 :(得分:6)
private static List<string> wordsToRemove =
"DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND".Split(' ').ToList();
public static string StringWordsRemove(string stringToClean)
{
return string.Join(" ", stringToClean.Split(' ').Except(wordsToRemove));
}
处理标点符号的修改:
public static string StringWordsRemove(string stringToClean)
{
// Define how to tokenize the input string, i.e. space only or punctuations also
return string.Join(" ", stringToClean
.Split(new[] { ' ', ',', '.', '?', '!' }, StringSplitOptions.RemoveEmptyEntries)
.Except(wordsToRemove));
}
答案 1 :(得分:1)
我刚刚改变了这一行
pattern = @"\b" + word + "\b";
到这个
pattern = @"\b" + word + @"\b"; //added '@'
我得到了结果
THIS IS AMAZING WEBSITE LAYOUT
如果您使用String.Empty
代替""
,那就更好了:
stringToClean = Regex.Replace(stringToClean, pattern, String.Empty);
答案 2 :(得分:1)
我用过LINQ
string exceptions = "DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND";
string[] exceptionsList = exceptions.Split(' ');
string test ="THIS IS AN AMAZING WEBSITE AND LAYOUT";
string[] wordList = test.Split(' ');
string final = null;
var result = wordList.Except(exceptionsList).ToArray();
final = String.Join(" ",result);
Console.WriteLine(final);
答案 3 :(得分:0)
输出你得到“这是令人惊叹的网站布局”。
我遇到了一个问题,因为它在结果中留下了“D”这个词(所以这是一个令人惊叹的网站D布局),因为如果你使用替换它只替换单词的某个部分。如果检测到您定义的字符,这会删除整个单词(我想这就是您想要的?)。
string[] tabooWords = "DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND".Split(' ');
string text = "THIS IS AN AMAZING WEBSITE AND LAYOUT";
string result = text;
foreach (string word in text.Split(' '))
{
if (tabooWords.Contains(word.ToUpper()))
{
int start = result.IndexOf(word);
result = result.Remove(start, word.Length);
}
}
答案 4 :(得分:0)
public static string StringWordsRemove(string stringToClean, string wordsToRemove)
{
string[] splitWords = wordsToRemove.Split(new Char[] { ' ' });
string pattern = " (" + string.Join("|", splitWords) + ") ";
string cleaned=Regex.Replace(stringToClean, pattern, " ");
return cleaned;
}
答案 5 :(得分:0)
怎么样,
// make a pattern to match all words
var pattern = string.Format(
@"\b({0})\b",
string.Join("|", wordsToremove.Split(new[] { ' ' })));
// pattern will be of the form "\b(badword1|badword2|...)\b"
// remove all the bad words from the string in one go.
var cleanString = Regex.Replace(stringToClean, pattern, string.Empty);
// normalise the white space in the string (one space at a time)
var normalisedString = Regex.Replace(cleanString, @"\s+", " ");
第一行创建一个匹配任何要删除的单词的模式。 第二行一次替换它们,这节省了不必要的迭代。 第三行标准化字符串中的空格。
答案 6 :(得分:0)
或者...
stringToClean = Regex.Replace(stringToClean, @"\bDE\b|\bDA\b|\bDAS\b|\bDO\b|\bDOS\b|\bAN\b|\bNAS\b|\bNO\b|\bNOS\b|\bEM\b|\bE\b|\bA\b|\bAS\b|\bO\b|\bOS\b|\bAO\b|\bAOS\b|\bP\b|\bLDA\b|\bAND\b", String.Empty);
stringToClean = Regex.Replace(stringToClean, " ", String.Empty);