使用c#从数组中的单词中删除字符串中的单词

时间:2013-07-16 14:07:48

标签: c# arrays string

我需要根据一组单词从字符串中删除单词:

我要删除的字词:

DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND

如果我收到如下字符串:

编辑:此字符串已经"已清理"来自任何符号

THIS IS AN AMAZING WEBSITE AND LAYOUT

结果应为:

THIS IS AMAZING WEBSITE LAYOUT

到目前为止,我有:

public static string StringWordsRemove(string stringToClean, string wordsToRemove)
{
    string[] splitWords = wordsToRemove.Split(new Char[] { ' ' });

    string pattern = "";

    foreach (string word in splitWords)
    {
        pattern = @"\b" + word + "\b";
        stringToClean = Regex.Replace(stringToClean, pattern, "");
    }

    return stringToClean;
}

但它没有删除这些词,任何想法?

我不知道我是否使用最有效的方法来执行此操作,或者将这些文字放在一个数组中以避免一直将它们分开?

由于

7 个答案:

答案 0 :(得分:6)

private static List<string> wordsToRemove =
    "DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND".Split(' ').ToList();

public static string StringWordsRemove(string stringToClean)
{
    return string.Join(" ", stringToClean.Split(' ').Except(wordsToRemove));
}

处理标点符号的修改:

public static string StringWordsRemove(string stringToClean)
{
    // Define how to tokenize the input string, i.e. space only or punctuations also
    return string.Join(" ", stringToClean
        .Split(new[] { ' ', ',', '.', '?', '!' }, StringSplitOptions.RemoveEmptyEntries)
        .Except(wordsToRemove));
}

答案 1 :(得分:1)

我刚刚改变了这一行

pattern = @"\b" + word + "\b";

到这个

pattern = @"\b" + word + @"\b"; //added '@' 

我得到了结果

THIS IS AMAZING WEBSITE LAYOUT

如果您使用String.Empty代替"",那就更好了:

stringToClean = Regex.Replace(stringToClean, pattern, String.Empty);

答案 2 :(得分:1)

我用过LINQ

string exceptions = "DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND";
string[] exceptionsList = exceptions.Split(' ');

string test  ="THIS IS AN AMAZING WEBSITE AND LAYOUT";
string[] wordList = test.Split(' ');

string final = null;
var result = wordList.Except(exceptionsList).ToArray();
final = String.Join(" ",result);

Console.WriteLine(final);

答案 3 :(得分:0)

输出你得到“这是令人惊叹的网站布局”。

我遇到了一个问题,因为它在结果中留下了“D”这个词(所以这是一个令人惊叹的网站D布局),因为如果你使用替换它只替换单词的某个部分。如果检测到您定义的字符,这会删除整个单词(我想这就是您想要的?)。

        string[] tabooWords = "DE DA DAS DO DOS AN NAS NO NOS EM E A AS O OS AO AOS P LDA AND".Split(' ');
        string text = "THIS IS AN AMAZING WEBSITE AND LAYOUT";
        string result = text;

        foreach (string word in text.Split(' '))
        {
            if (tabooWords.Contains(word.ToUpper()))
            {
                int start = result.IndexOf(word);
                result = result.Remove(start, word.Length);
            }
        }

答案 4 :(得分:0)

public static string StringWordsRemove(string stringToClean, string wordsToRemove)
{
    string[] splitWords = wordsToRemove.Split(new Char[] { ' ' });
    string pattern = " (" + string.Join("|", splitWords) + ") ";
    string cleaned=Regex.Replace(stringToClean, pattern, " ");
    return cleaned;
}

答案 5 :(得分:0)

怎么样,

// make a pattern to match all words 
var pattern = string.Format(
    @"\b({0})\b",
    string.Join("|", wordsToremove.Split(new[] { ' ' })));

// pattern will be of the form "\b(badword1|badword2|...)\b"

// remove all the bad words from the string in one go.    
var cleanString = Regex.Replace(stringToClean, pattern, string.Empty);

// normalise the white space in the string (one space at a time)
var normalisedString = Regex.Replace(cleanString, @"\s+", " ");

第一行创建一个匹配任何要删除的单词的模式。 第二行一次替换它们,这节省了不必要的迭代。 第三行标准化字符串中的空格。

答案 6 :(得分:0)

或者...

stringToClean = Regex.Replace(stringToClean, @"\bDE\b|\bDA\b|\bDAS\b|\bDO\b|\bDOS\b|\bAN\b|\bNAS\b|\bNO\b|\bNOS\b|\bEM\b|\bE\b|\bA\b|\bAS\b|\bO\b|\bOS\b|\bAO\b|\bAOS\b|\bP\b|\bLDA\b|\bAND\b", String.Empty);
stringToClean = Regex.Replace(stringToClean, "  ", String.Empty);