C#坏词空间过滤器

时间:2013-12-28 13:13:49

标签: c# filter

我尝试使用错误的文字过滤器,但是如果有人写“f u c k”,它会失败,所以我尝试过滤掉所有空格,然后将空格放回去。这是一张图片来说明:

http://i42.tinypic.com/353eee9.png

我希望你明白! :)如果你不理解我,请不要给我“-1”

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Windows.Forms;

namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
    private static List<string> list_0;
    private static List<string> list_1;
    private static List<bool> list_2;
    private static List<string> list_3;
    public Form1()
    {
        InitializeComponent();
        list_0 = new List<string>();
        list_1 = new List<string>();
        list_2 = new List<bool>();
        list_3 = new List<string>();
        list_0.Add("fuck");
        list_1.Add("****");
        list_2.Add(true);
    }

    private void button1_Click(object sender, EventArgs e)
    {
        string Message = textBox2.Text;

        if (list_0 != null && list_0.Count > 0)
        {
            int num = -1;
            foreach (string current in list_0)
            {
                textBox3.Text = FilterSpace(Message.ToLower());
                num++;
                if (FilterSpace(Message.ToLower()).Contains(current.ToLower()) && list_2[num])
                {
                    Message = Regex.Replace(FilterSpace(Message.ToLower()), current, list_1[num], RegexOptions.IgnoreCase);
                }
                else
                {
                    if (FilterSpace(Message.ToLower()).Contains(" " + current.ToLower() + " "))
                    {
                        Message = Regex.Replace(Message, current, list_1[num], RegexOptions.IgnoreCase);
                    }
                }
                textBox1.Text = Message;
            }
        }
    }

    public static string FilterSpace(string message)
    {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < message.Length; i++)
        {
            char c = message[i];
            if (c == ' ')
                sb.Append("");
            else
                sb.Append(c);
        }
        return sb.ToString();
    }
}
}

2 个答案:

答案 0 :(得分:6)

您可以使用正则表达式来处理此问题,要求.NET Regex对象替换输入字符串中的匹配字符串。

你需要仔细构建模式,以处理空间。

这是一个LINQPad程序,演示了:

void Main()
{
    string[] badWords = new[] { "bad", "word", "words" };

    string input = "This is a bad string containing some of the words in the"
        + " list, even one w o r d that has whitespace";
    string output = Filter(input, badWords);
    Debug.WriteLine(output);
}

public static string Filter(string input, string[] badWords)
{
    var re = new Regex(
        @"\b("
        + string.Join("|", badWords.Select(word =>
            string.Join(@"\s*", word.ToCharArray())))
        + @")\b", RegexOptions.IgnoreCase);
    return re.Replace(input, match =>
    {
        return new string('*', match.Length);
    });
}

基本上,我构造了这样的正则表达式:

\b(              <-- start at a word boundary and start a capture group
b\s*a\s*d        <-- the word "bad" with an optional amount of whitespace
|                <-- next word
w\s*o\s*r\s*d    <-- the word "word" with an optional amount of whitespace
|                <-- next word
... and so on
)\b              <-- end the capture group, and end at a word boundary

然后我要求评估者代表用适当数量的星号替换每个字符串。

最终输出:

  

This is a *** string containing some of the ***** in the list, even one ******* that has whitespace

答案 1 :(得分:0)

使用RegEx。例如。与禁止词的匹配列表进行比较,并将匹配或匹配的前三个字符替换为'*'。 为此,您可以使用正则表达式选项(链接包含示例):Regular Expression Options

将选项设置为“忽略所有空格”并进行匹配。这样所有组合(例如f.u.c.k,f ..... u ... ck,...)[。被抓住了。您也可以通过这种方式忽略区分大小写。

从MSDN获取的使用示例(参见上面的链接):

string pattern = @"d \w+ \s";
string input = "Dogs are decidedly good pets.";
RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace;

foreach (Match match in Regex.Matches(input, pattern, options))
   Console.WriteLine("'{0}// found at index {1}.", match.Value, match.Index);
// The example displays the following output: 
//    'Dogs // found at index 0. 
//    'decidedly // found at index 9