Question

我有一个糟糕的单词替换VB.net的脚本，导致了许多问题。经过多次反复试验后，当前代码可以正常运行，但不会过滤掉有上限的单词。

    Private Function CheckForBadWords(ByVal InputString As String) As String
        Dim r As Regex
        Dim element As String
        Dim eLength As Integer
        Dim x As Integer
        Dim AttachtoEnd As String
        For Each element In alWordList
            r = New Regex("\b" & element)
            eLength = element.Length
            For x = 3 To eLength - 1
                AttachtoEnd = AttachtoEnd & "*"
            Next
            InputString = r.Replace(InputString, element, Left(element, 3) & AttachtoEnd)
            AttachtoEnd = ""
        Next
        Return InputString
    End Function

如何检查带有大写字母的单词？例如：phuck将检查Phuck或PHUCK不会被检查的位置。

我尝试过本教程，但它在C＃中，我几乎不知道VB.net： http://www.dreamincode.net/forums/topic/67129-creating-a-bad-word-filter-functionality-in-aspnet-wc%23/

添加更多细节：通过一些帮助，这似乎在多次调整之后起作用，但错误仍然存在，特别是引号和双引号或＆lt; br＆gt; s。

    Private Function CheckForBadWords(ByVal InputString As String) As String
        Dim starPosition As Integer = 0
        Dim element As String
        Dim eLength As Integer
        Dim x As Integer
        Dim AttachtoEnd As String
        Dim strArray = InputString.Split(" ")
        Dim specialChars As New List(Of String)(New String() {"@", "!", ".", ",", "(", ")", "/", "#", "$", "&", "+", "-", "_", "=", ":", "'", "*", "^", "`", "<", ">", "[", "]", "{", "}", "\", "|", ControlChars.Quote})
        Dim firstChars As String = ""
        Dim LastChars As String = ""
        InputString = String.Empty
        For Each item As String In strArray
            Dim str As String = item
            firstChars = String.Empty
            LastChars = String.Empty
            For Each ch As Char In str
                If Not specialChars.Contains(ch) Then
                    Exit For
                Else
                    firstChars += ch
                End If
            Next
            For Each spChar As Char In firstChars.ToCharArray()
                str = str.Trim(spChar)
            Next
            For i As Integer = str.Length - 1 To 0 Step -1
                If Not specialChars.Contains(str(i)) Then
                    Exit For
                Else
                    LastChars = str(i) + LastChars
                End If
            Next
            For Each spChar As String In specialChars
                str = str.Trim(spChar)
            Next
            If Not String.IsNullOrWhiteSpace(str) Then
                For Each element In alWordList
                    If element.ToLower = str.ToLower Then
                        str = str.Trim()
                        eLength = element.Length
                        For x = 3 To eLength - 1
                            AttachtoEnd = AttachtoEnd & "*"
                            starPosition += 1
                        Next
                        str = str.Substring(0, str.Length - starPosition) & AttachtoEnd
                    End If
                    AttachtoEnd = ""
                    starPosition = 0
                Next
            End If
            InputString += firstChars + str + LastChars & " "
        Next
        Return InputString
    End Function

所以现在我认为最好回到正则表达式，这非常有效，只需要处理大写字母。

最后一个注释......要检查的词语是作为一个arraylist进入。

Answer 1

如果你想用一个前3个字母保留的方式替换字符串中的所有＆＃34;坏单词和＃34;单词替换为phu***之类的星号，你想要比较不区分大小写的;没有内置方法。你可以用

Regex.Replace RegexOptions.IgnoreCase或
Microsoft.VisualBasic.Strings.Replace与CompareMethod.Text。

但两者都有缺点，即他们会用新值替换旧值，而新值不会保留旧案例。如果这个词是PHUCK而你的＆＃34;坏词＆＃34;在列表中Phuck它将替换为Ph***而不是PH***。

由于您评论说这很重要，唯一的方法是编写自定义方法：

Module StringExtensions

    <Extension()>
    Public Function ReplaceBadWords(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
        Dim sb As StringBuilder = New StringBuilder(str)
        For Each badWord As String In badWords
            Dim index As Integer = str.IndexOf(badWord, comparison)
            While index <> -1
                Dim oldValue As String = str.Substring(index, badWord.Length)
                Dim newValue As String
                If badWord.Length > showClearTextLength Then
                    newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
                Else
                    newValue = New String(obfuscateChar, oldValue.Length)
                End If

                For i As Integer = index To index + newValue.Length - 1
                    sb(i) = newValue(i - index)
                Next

                index += newValue.Length
                index = str.IndexOf(badWord, index, comparison)
            End While
        Next

        Return sb.ToString()
    End Function

End Module

你的（愚蠢）样本：

Dim replaced = "phuck will get check where as Phuck or PHUCK".
    ReplaceBadWords({ "Phuck", "ILL" }, StringComparison.CurrentCultureIgnoreCase)

结果：

phu** w*** get check where as Phu** or PHU**

如果您有大量＆＃34;坏词＆＃34;：

，则为并行版本

<Extension()>
Public Function ReplaceBadWordsParallel(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
    Dim sb As StringBuilder = New StringBuilder(str)

    Parallel.ForEach(badWords, 
        Sub(badWord)
            Dim index As Integer = str.IndexOf(badWord, comparison)
            While index <> -1
                Dim oldValue As String = str.Substring(index, badWord.Length)
                Dim newValue As String
                If badWord.Length > showClearTextLength Then
                    newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
                Else
                    newValue = New String(obfuscateChar, oldValue.Length)
                End If

                For i As Integer = index To index + newValue.Length - 1
                    sb(i) = newValue(i - index)
                Next

                index += newValue.Length
                index = str.IndexOf(badWord, index, comparison)
            End While
        End Sub)

    Return sb.ToString()
End Function

请注意，我还没有检查并行版本是否是线程安全的。

C＃版本如果有人有兴趣：

public static string ReplaceBadWords(this string str, IEnumerable<string> badWords, StringComparison comparison, int showClearTextLength = 3, char obfuscateChar = '*')
{
    StringBuilder sb = new StringBuilder(str);

    foreach (string badWord in badWords)
    {
        int index = str.IndexOf(badWord, comparison);
        while (index != -1)
        {
            string oldValue = str.Substring(index, badWord.Length);
            string newValue;
            if (badWord.Length > showClearTextLength)
            {
                newValue = oldValue.Remove(showClearTextLength) + new string(obfuscateChar, oldValue.Length - showClearTextLength);
            }
            else
            {
                newValue = new string(obfuscateChar, oldValue.Length);
            }
            for (int i = index; i < index + newValue.Length; i++)
                sb[i] = newValue[i - index];

            index += newValue.Length;
            index = str.IndexOf(badWord, index, comparison);
        }
    }           

    return sb.ToString();
}

Answer 2

如果您的初始代码有效，只需使Regex不区分大小写：

r = New Regex("\b" & element, RegexOptions.IgnoreCase)

不区分大小写表示正则表达式不关心大写或小写。

有关详细信息，请参阅Regular Expression Options的文档。

替换＆＃34;坏词＆＃34;部分用星号，忽略案件并保留旧案

2 个答案: