替换"坏词"部分用星号,忽略案件并保留旧案

时间:2017-12-04 08:46:13

标签: vb.net

我有一个糟糕的单词替换VB.net的脚本,导致了许多问题。经过多次反复试验后,当前代码可以正常运行,但不会过滤掉有上限的单词。

    Private Function CheckForBadWords(ByVal InputString As String) As String
        Dim r As Regex
        Dim element As String
        Dim eLength As Integer
        Dim x As Integer
        Dim AttachtoEnd As String
        For Each element In alWordList
            r = New Regex("\b" & element)
            eLength = element.Length
            For x = 3 To eLength - 1
                AttachtoEnd = AttachtoEnd & "*"
            Next
            InputString = r.Replace(InputString, element, Left(element, 3) & AttachtoEnd)
            AttachtoEnd = ""
        Next
        Return InputString
    End Function

如何检查带有大写字母的单词?例如:phuck将检查Phuck或PHUCK不会被检查的位置。

我尝试过本教程,但它在C#中,我几乎不知道VB.net: http://www.dreamincode.net/forums/topic/67129-creating-a-bad-word-filter-functionality-in-aspnet-wc%23/

添加更多细节:通过一些帮助,这似乎在多次调整之后起作用,但错误仍然存​​在,特别是引号和双引号或< br> s。

    Private Function CheckForBadWords(ByVal InputString As String) As String
        Dim starPosition As Integer = 0
        Dim element As String
        Dim eLength As Integer
        Dim x As Integer
        Dim AttachtoEnd As String
        Dim strArray = InputString.Split(" ")
        Dim specialChars As New List(Of String)(New String() {"@", "!", ".", ",", "(", ")", "/", "#", "$", "&", "+", "-", "_", "=", ":", "'", "*", "^", "`", "<", ">", "[", "]", "{", "}", "\", "|", ControlChars.Quote})
        Dim firstChars As String = ""
        Dim LastChars As String = ""
        InputString = String.Empty
        For Each item As String In strArray
            Dim str As String = item
            firstChars = String.Empty
            LastChars = String.Empty
            For Each ch As Char In str
                If Not specialChars.Contains(ch) Then
                    Exit For
                Else
                    firstChars += ch
                End If
            Next
            For Each spChar As Char In firstChars.ToCharArray()
                str = str.Trim(spChar)
            Next
            For i As Integer = str.Length - 1 To 0 Step -1
                If Not specialChars.Contains(str(i)) Then
                    Exit For
                Else
                    LastChars = str(i) + LastChars
                End If
            Next
            For Each spChar As String In specialChars
                str = str.Trim(spChar)
            Next
            If Not String.IsNullOrWhiteSpace(str) Then
                For Each element In alWordList
                    If element.ToLower = str.ToLower Then
                        str = str.Trim()
                        eLength = element.Length
                        For x = 3 To eLength - 1
                            AttachtoEnd = AttachtoEnd & "*"
                            starPosition += 1
                        Next
                        str = str.Substring(0, str.Length - starPosition) & AttachtoEnd
                    End If
                    AttachtoEnd = ""
                    starPosition = 0
                Next
            End If
            InputString += firstChars + str + LastChars & " "
        Next
        Return InputString
    End Function

所以现在我认为最好回到正则表达式,这非常有效,只需要处理大写字母。

最后一个注释......要检查的词语是作为一个arraylist进入。

2 个答案:

答案 0 :(得分:3)

如果你想用一个前3个字母保留的方式替换字符串中的所有&#34;坏单词和#34;单词替换为phu***之类的星号,你想要比较不区分大小写的;没有内置方法。你可以用

  • Regex.Replace RegexOptions.IgnoreCase
  • Microsoft.VisualBasic.Strings.ReplaceCompareMethod.Text

但两者都有缺点,即他们会用新值替换旧值,而新值不会保留旧案例。如果这个词是PHUCK而你的&#34;坏词&#34;在列表中Phuck它将替换为Ph***而不是PH***

由于您评论说这很重要,唯一的方法是编写自定义方法:

Module StringExtensions

    <Extension()>
    Public Function ReplaceBadWords(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
        Dim sb As StringBuilder = New StringBuilder(str)
        For Each badWord As String In badWords
            Dim index As Integer = str.IndexOf(badWord, comparison)
            While index <> -1
                Dim oldValue As String = str.Substring(index, badWord.Length)
                Dim newValue As String
                If badWord.Length > showClearTextLength Then
                    newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
                Else
                    newValue = New String(obfuscateChar, oldValue.Length)
                End If

                For i As Integer = index To index + newValue.Length - 1
                    sb(i) = newValue(i - index)
                Next

                index += newValue.Length
                index = str.IndexOf(badWord, index, comparison)
            End While
        Next

        Return sb.ToString()
    End Function

End Module

你的(愚蠢)样本:

Dim replaced = "phuck will get check where as Phuck or PHUCK".
    ReplaceBadWords({ "Phuck", "ILL" }, StringComparison.CurrentCultureIgnoreCase)

结果:

phu** w*** get check where as Phu** or PHU**

如果您有大量&#34;坏词&#34;:

,则为并行版本
<Extension()>
Public Function ReplaceBadWordsParallel(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
    Dim sb As StringBuilder = New StringBuilder(str)

    Parallel.ForEach(badWords, 
        Sub(badWord)
            Dim index As Integer = str.IndexOf(badWord, comparison)
            While index <> -1
                Dim oldValue As String = str.Substring(index, badWord.Length)
                Dim newValue As String
                If badWord.Length > showClearTextLength Then
                    newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
                Else
                    newValue = New String(obfuscateChar, oldValue.Length)
                End If

                For i As Integer = index To index + newValue.Length - 1
                    sb(i) = newValue(i - index)
                Next

                index += newValue.Length
                index = str.IndexOf(badWord, index, comparison)
            End While
        End Sub)

    Return sb.ToString()
End Function

请注意,我还没有检查并行版本是否是线程安全的

C#版本如果有人有兴趣:

public static string ReplaceBadWords(this string str, IEnumerable<string> badWords, StringComparison comparison, int showClearTextLength = 3, char obfuscateChar = '*')
{
    StringBuilder sb = new StringBuilder(str);

    foreach (string badWord in badWords)
    {
        int index = str.IndexOf(badWord, comparison);
        while (index != -1)
        {
            string oldValue = str.Substring(index, badWord.Length);
            string newValue;
            if (badWord.Length > showClearTextLength)
            {
                newValue = oldValue.Remove(showClearTextLength) + new string(obfuscateChar, oldValue.Length - showClearTextLength);
            }
            else
            {
                newValue = new string(obfuscateChar, oldValue.Length);
            }
            for (int i = index; i < index + newValue.Length; i++)
                sb[i] = newValue[i - index];

            index += newValue.Length;
            index = str.IndexOf(badWord, index, comparison);
        }
    }           

    return sb.ToString();
}

答案 1 :(得分:1)

如果您的初始代码有效,只需使Regex不区分大小写:

r = New Regex("\b" & element, RegexOptions.IgnoreCase)

不区分大小写表示正则表达式不关心大写或小写。

有关详细信息,请参阅Regular Expression Options的文档。