c#regex如何将用户的输入与单词/短语数组

时间:2015-05-06 06:49:58

标签: c# arrays regex match

我有一个包含不同单词和短语的数组。用户将输入垃圾邮件,我应该检查是否与阵列中已有的单词和短语有任何匹配。对于每场比赛,分数将为+1,如果分数超过5,那么它是垃圾邮件的可能性为是。

我的分数并没有增加,我不知道为什么。

string[] spam = new string[] {"-different words and phrases provided by programmer"};

        Console.Write("Key in an email message: ");
        string email = Console.ReadLine();
        int score = 0;

        string pattern = "^\\[a-zA-Z]";
        Regex expression = new Regex(pattern);
        var regexp = new System.Text.RegularExpressions.Regex(pattern);

        if (!regexp.IsMatch(email))
        {
            score += 1;
        }

2 个答案:

答案 0 :(得分:0)

 static void Main(string[] args)
            {
                string[] spam = new string[] { "test", "ak", "admin", "againadmin" };
                string email = "Its great to see that admin ak is not perfroming test.";
                string email1 = "Its great to see that admin ak is not perfroming test againadmin.";

                if (SpamChecker(spam, email))
                {
                    Console.WriteLine("email spam");
                }
                else 
                {
                    Console.WriteLine("email not spam");
                }

                if (SpamChecker(spam, email1))
                {
                    Console.WriteLine("email1 spam");
                }
                else
                {
                    Console.WriteLine("email1 not spam");
                }

                Console.Read();
            }

            private static bool SpamChecker(string[] spam, string email)
            {
                int score = 0;
                foreach (var item in spam)
                {
                    score += Regex.Matches(email, item, RegexOptions.Compiled | RegexOptions.IgnoreCase).Count;
                    if (score > 3) // change count as per desired count
                    {
                        return true;
                    }
                }

                return false;
            }

答案 1 :(得分:0)

您可以使用 Linq 来解决问题

  // HashSet<String> is for better performance
  HashSet<String> spamWords = new HashSet<String>(
    "different words and phrases provided by programmer"
      .Split(new Char[] {' '}, StringSplitOptions.RemoveEmptyEntries)
      .Select(word => word.ToUpper()));

  ...

  String eMail = "phrases, not words and letters zzz";

  ... 

  // score == 3: "phrases" + "words" + "and"
  int score = Regex
    .Matches(eMail, @"\w+")
    .OfType<Match>()
    .Select(match => match.Value.ToUpper())
    .Sum(word => spamWords.Contains(word) ? 1 : 0);

在此实现中,我正在寻找不区分大小写方式的垃圾邮件字(因此AndandAND将被视为垃圾邮件字) 。要考虑复数 ings (即wordwording),您必须使用 stemmer