我有一个包含不同单词和短语的数组。用户将输入垃圾邮件,我应该检查是否与阵列中已有的单词和短语有任何匹配。对于每场比赛,分数将为+1,如果分数超过5,那么它是垃圾邮件的可能性为是。
我的分数并没有增加,我不知道为什么。
string[] spam = new string[] {"-different words and phrases provided by programmer"};
Console.Write("Key in an email message: ");
string email = Console.ReadLine();
int score = 0;
string pattern = "^\\[a-zA-Z]";
Regex expression = new Regex(pattern);
var regexp = new System.Text.RegularExpressions.Regex(pattern);
if (!regexp.IsMatch(email))
{
score += 1;
}
答案 0 :(得分:0)
static void Main(string[] args)
{
string[] spam = new string[] { "test", "ak", "admin", "againadmin" };
string email = "Its great to see that admin ak is not perfroming test.";
string email1 = "Its great to see that admin ak is not perfroming test againadmin.";
if (SpamChecker(spam, email))
{
Console.WriteLine("email spam");
}
else
{
Console.WriteLine("email not spam");
}
if (SpamChecker(spam, email1))
{
Console.WriteLine("email1 spam");
}
else
{
Console.WriteLine("email1 not spam");
}
Console.Read();
}
private static bool SpamChecker(string[] spam, string email)
{
int score = 0;
foreach (var item in spam)
{
score += Regex.Matches(email, item, RegexOptions.Compiled | RegexOptions.IgnoreCase).Count;
if (score > 3) // change count as per desired count
{
return true;
}
}
return false;
}
答案 1 :(得分:0)
您可以使用 Linq 来解决问题
// HashSet<String> is for better performance
HashSet<String> spamWords = new HashSet<String>(
"different words and phrases provided by programmer"
.Split(new Char[] {' '}, StringSplitOptions.RemoveEmptyEntries)
.Select(word => word.ToUpper()));
...
String eMail = "phrases, not words and letters zzz";
...
// score == 3: "phrases" + "words" + "and"
int score = Regex
.Matches(eMail, @"\w+")
.OfType<Match>()
.Select(match => match.Value.ToUpper())
.Sum(word => spamWords.Contains(word) ? 1 : 0);
在此实现中,我正在寻找不区分大小写方式的垃圾邮件字(因此And
,and
,AND
将被视为垃圾邮件字) 。要考虑复数, ings (即word
,wording
),您必须使用 stemmer 。