我有一个糟糕的单词替换VB.net的脚本,导致了许多问题。经过多次反复试验后,当前代码可以正常运行,但不会过滤掉有上限的单词。
Private Function CheckForBadWords(ByVal InputString As String) As String
Dim r As Regex
Dim element As String
Dim eLength As Integer
Dim x As Integer
Dim AttachtoEnd As String
For Each element In alWordList
r = New Regex("\b" & element)
eLength = element.Length
For x = 3 To eLength - 1
AttachtoEnd = AttachtoEnd & "*"
Next
InputString = r.Replace(InputString, element, Left(element, 3) & AttachtoEnd)
AttachtoEnd = ""
Next
Return InputString
End Function
如何检查带有大写字母的单词?例如:phuck将检查Phuck或PHUCK不会被检查的位置。
我尝试过本教程,但它在C#中,我几乎不知道VB.net: http://www.dreamincode.net/forums/topic/67129-creating-a-bad-word-filter-functionality-in-aspnet-wc%23/
添加更多细节:通过一些帮助,这似乎在多次调整之后起作用,但错误仍然存在,特别是引号和双引号或< br> s。
Private Function CheckForBadWords(ByVal InputString As String) As String
Dim starPosition As Integer = 0
Dim element As String
Dim eLength As Integer
Dim x As Integer
Dim AttachtoEnd As String
Dim strArray = InputString.Split(" ")
Dim specialChars As New List(Of String)(New String() {"@", "!", ".", ",", "(", ")", "/", "#", "$", "&", "+", "-", "_", "=", ":", "'", "*", "^", "`", "<", ">", "[", "]", "{", "}", "\", "|", ControlChars.Quote})
Dim firstChars As String = ""
Dim LastChars As String = ""
InputString = String.Empty
For Each item As String In strArray
Dim str As String = item
firstChars = String.Empty
LastChars = String.Empty
For Each ch As Char In str
If Not specialChars.Contains(ch) Then
Exit For
Else
firstChars += ch
End If
Next
For Each spChar As Char In firstChars.ToCharArray()
str = str.Trim(spChar)
Next
For i As Integer = str.Length - 1 To 0 Step -1
If Not specialChars.Contains(str(i)) Then
Exit For
Else
LastChars = str(i) + LastChars
End If
Next
For Each spChar As String In specialChars
str = str.Trim(spChar)
Next
If Not String.IsNullOrWhiteSpace(str) Then
For Each element In alWordList
If element.ToLower = str.ToLower Then
str = str.Trim()
eLength = element.Length
For x = 3 To eLength - 1
AttachtoEnd = AttachtoEnd & "*"
starPosition += 1
Next
str = str.Substring(0, str.Length - starPosition) & AttachtoEnd
End If
AttachtoEnd = ""
starPosition = 0
Next
End If
InputString += firstChars + str + LastChars & " "
Next
Return InputString
End Function
所以现在我认为最好回到正则表达式,这非常有效,只需要处理大写字母。
最后一个注释......要检查的词语是作为一个arraylist进入。
答案 0 :(得分:3)
如果你想用一个前3个字母保留的方式替换字符串中的所有&#34;坏单词和#34;单词替换为phu***
之类的星号,你想要比较不区分大小写的;没有内置方法。你可以用
Regex.Replace
RegexOptions.IgnoreCase
或Microsoft.VisualBasic.Strings.Replace
与CompareMethod.Text
。 但两者都有缺点,即他们会用新值替换旧值,而新值不会保留旧案例。如果这个词是PHUCK
而你的&#34;坏词&#34;在列表中Phuck
它将替换为Ph***
而不是PH***
。
由于您评论说这很重要,唯一的方法是编写自定义方法:
Module StringExtensions
<Extension()>
Public Function ReplaceBadWords(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
Dim sb As StringBuilder = New StringBuilder(str)
For Each badWord As String In badWords
Dim index As Integer = str.IndexOf(badWord, comparison)
While index <> -1
Dim oldValue As String = str.Substring(index, badWord.Length)
Dim newValue As String
If badWord.Length > showClearTextLength Then
newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
Else
newValue = New String(obfuscateChar, oldValue.Length)
End If
For i As Integer = index To index + newValue.Length - 1
sb(i) = newValue(i - index)
Next
index += newValue.Length
index = str.IndexOf(badWord, index, comparison)
End While
Next
Return sb.ToString()
End Function
End Module
你的(愚蠢)样本:
Dim replaced = "phuck will get check where as Phuck or PHUCK".
ReplaceBadWords({ "Phuck", "ILL" }, StringComparison.CurrentCultureIgnoreCase)
结果:
phu** w*** get check where as Phu** or PHU**
如果您有大量&#34;坏词&#34;:
,则为并行版本<Extension()>
Public Function ReplaceBadWordsParallel(ByVal str As String, ByVal badWords As IEnumerable(Of String), ByVal comparison As StringComparison, ByVal Optional showClearTextLength As Integer = 3, ByVal Optional obfuscateChar As Char = "*"c) As String
Dim sb As StringBuilder = New StringBuilder(str)
Parallel.ForEach(badWords,
Sub(badWord)
Dim index As Integer = str.IndexOf(badWord, comparison)
While index <> -1
Dim oldValue As String = str.Substring(index, badWord.Length)
Dim newValue As String
If badWord.Length > showClearTextLength Then
newValue = oldValue.Remove(showClearTextLength) & New String(obfuscateChar, oldValue.Length - showClearTextLength)
Else
newValue = New String(obfuscateChar, oldValue.Length)
End If
For i As Integer = index To index + newValue.Length - 1
sb(i) = newValue(i - index)
Next
index += newValue.Length
index = str.IndexOf(badWord, index, comparison)
End While
End Sub)
Return sb.ToString()
End Function
请注意,我还没有检查并行版本是否是线程安全的。
C#版本如果有人有兴趣:
public static string ReplaceBadWords(this string str, IEnumerable<string> badWords, StringComparison comparison, int showClearTextLength = 3, char obfuscateChar = '*')
{
StringBuilder sb = new StringBuilder(str);
foreach (string badWord in badWords)
{
int index = str.IndexOf(badWord, comparison);
while (index != -1)
{
string oldValue = str.Substring(index, badWord.Length);
string newValue;
if (badWord.Length > showClearTextLength)
{
newValue = oldValue.Remove(showClearTextLength) + new string(obfuscateChar, oldValue.Length - showClearTextLength);
}
else
{
newValue = new string(obfuscateChar, oldValue.Length);
}
for (int i = index; i < index + newValue.Length; i++)
sb[i] = newValue[i - index];
index += newValue.Length;
index = str.IndexOf(badWord, index, comparison);
}
}
return sb.ToString();
}
答案 1 :(得分:1)
如果您的初始代码有效,只需使Regex不区分大小写:
r = New Regex("\b" & element, RegexOptions.IgnoreCase)
不区分大小写表示正则表达式不关心大写或小写。
有关详细信息,请参阅Regular Expression Options的文档。