如果在子字符串中发生匹配,如何在文本中查找模式匹配(有例外)?

时间:2018-12-31 20:52:33

标签: c#

我想确定在某些文本中是否存在任何字符串(从可拒绝的字符串列表中出现),但前提是在该字符串中(从允许的字符串列表)。

简单的例子:

文本:“迅捷的红狐狸跳过了农夫面前那条懒惰的棕色狗。”

rejectableStrings: "fox", "dog", "farmer"

allowableStrings: "quick red fox", "smurfy blue fox", "lazy brown dog", "old green farmer"

因此,如果在文本中找到每个字符串“ fox”,“ dog”或“ farmer”中的任何一个,则提高标志,但如果找到的字符串包含在任何允许的字符串中(在在文本中找到拒绝的相同位置。

示例逻辑尚未完成:

string status = "allowable";
foreach (string rejectableString in rejectableStrings)
{
  // check if rejectableString is found as a whole word with either a space or start/end of string surrounding the flag
  // https://stackoverflow.com/a/16213482/56082
  string invalidValuePattern = string.Format(@"(?<!\S){0}(?!\S)", rejectableString);
  if (Regex.IsMatch(text, invalidValuePattern, RegexOptions.IgnoreCase))
  {
    // it is found so we initially raise the flag to check further
    status = "flagged";
    foreach (string allowableString in allowableStrings)
    {
      // only need to consider allowableString if it contains the rejectableString, otherwise ignore
      if (allowableString.Contains(rejectableString)) 
      {
        // check if the found occurence of the rejectableString in text is actually contained within a relevant allowableString, 

        // *** the area that needs attention *** 
        if ('rejectableString occurence found in text is also contained within the same substring allowableString of text')
        {
          // this occurrence of rejectableString is actually allowable, change status back to allowable and break out of the allowable foreach
          status = "allowable";
          break;
        } 
      }
    }
    if (status.Equals("flagged")) 
    {
      throw new Exception(rejectableString.ToUpper() + " found in text is not allowed.");
    }
  }
}

感兴趣的背景:这是针对应用程序的SQL查询验证方法,其目的是拒绝包含永久数据库修改命令的查询,但允许如果找到的无效命令实际上是临时表命令的子字符串或应该允许该命令在查询中使用的其他逻辑异常,则被视为有效。这是一个多数据库查询验证,不特定于单个数据库产品。

所以现实世界中可以拒绝和允许的例子是

private string[] rejectableStrings = {"insert","update","set","alter",
   "create","delete"};
private string[] allowableStrings = { "insert into #", "create table #",
   "create global temporary table ", "create temporary tablespace ", "offset "};

,文本将是一个SQL查询。

1 个答案:

答案 0 :(得分:3)

您可以先删除所有可接受的单词,然后检查所有不允许的单词,以实现此目的。这样可以确保当您查找不允许的单词时,您不会查找任何允许的单词。

public static void Main(string[] args)
{
   string[] rejectableStrings = new string[] {"fox", "dog", "farmer"};
   string[] allowableStrings = new string[] {"quick red fox", "smurfy blue fox", 
                                             "lazy brown dog", "old green farmer"};
   string teststr = "fox quick red fox";
   bool pass = true;
   foreach (string allowed in allowableStrings)
   {
      teststr = Regex.Replace(teststr, allowed, "", RegexOptions.IgnoreCase);
   }

   foreach (string reject in rejectableStrings)
   {
      if (Regex.Matches(teststr, reject, RegexOptions.IgnoreCase).Count > 0) {
         pass = false;
     }
   }
   Console.WriteLine(pass);
}

Try it Online