解析自定义过滤器语法的最佳方法

时间:2015-01-29 09:59:27

标签: c# parsing filter tokenize

我有一个程序,允许用户在DataGridView的列标题中的文本框中输入过滤器。 然后将此文本解析为FilterOperations列表。

目前我对字符串进行了标记,然后在hunge For-loop中构建列表。

我可以使用哪些Desing Patterns来摆脱巨大的建构?

我可以采取其他措施来改善设计吗?

在当前状态下,很难添加对另一个运算符,数据类型的支持,或者在filterlist中构建其他内容。假设我需要用构建表达式(很快就会出现这种情况)替换filterlist或者构建一个SQL Where子句。

Filtersyntax

过滤器遵循此语法,对字符串,数字和日期时间有效:

Rangeoperator

lowerLimit .. upperLimit

29..52将被解析为过滤器列表中的两个元素“x> = 29”和“x< = 52”

LowerThan

.. upperLimit

.. 52将被解析为“x <52”

GreaterThan

lowerLimit ..

29 ..将被解析为“x&gt; 29”

通配符

*someText*在SQL

中等于x LIKE“%someText%”

字符串文字

单引号'

之间忽略

'等...或*等运算符

令牌

所以我定义了三个令牌

RangeOperator for ..

*

通配符 纯值的

文字和单引号中的值

我构建列表的丑陋代码

public static FilterList<T> Parse<T>(string filter, string columnname, Type dataType) where T : class
        {
            if (dataType != typeof(float) && dataType != typeof(DateTime) && dataType != typeof(string))
                throw new NotSupportedException(String.Format("Data Type is not supported '{0}'", dataType));

            Token[] filterParts = tokenize(filter);
            filterParts = cleanUp(filterParts);

            StringBuilder sb = new StringBuilder();

            for (int i = 0; i < filterParts.Length; i++)
            {
                Token currentToken = filterParts[i];
                //BereichsFilter prüfen und bauen
                if (currentToken.TokenType == TokenType.RangeOperator)
                {
                    if (filterParts.Length < 2)
                    {
                        throw new FilterException("Missing argument for RangeOperator");
                    }
                    if (filterParts.Length > 3)
                    {
                        throw new FilterException("RangeOperator can't be mixed with other operators");
                    }

                    if (i == 0)
                    {
                        if (filterParts.Length == 2)
                        {
                            //Bis Operator
                            Token right = filterParts[1];
                            if (right.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(right.Text))
                                throw new FilterException("Text must have value");
                            if (right.Text.StartsWith("."))
                                throw new FilterException("Text starting with a dot is not valid");

                            if (dataType == typeof(string))
                                return new FilterList<T> { { columnname, FilterOperator.Less, right.Text } };
                            //filterString = String.Format("({0} < '{1}' OR {0} IS NULL)", columnname, right.Text);
                            if (dataType == typeof(float))
                            {
                                float rightF;
                                if (!float.TryParse(right.Text, out rightF))
                                    throw new FilterException(
                                        String.Format("right parameter has wrong format '{0}'", right.Text));
                                return new FilterList<T> { { columnname, FilterOperator.Less, rightF } };
                                //filterString = String.Format("({0} < {1} OR {0} IS NULL)", columnname, rightF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime rightDt = parseDateTime(right.Text);
                                return new FilterList<T> { { columnname, FilterOperator.Less, rightDt } };
                                //filterString = String.Format("({0} < '{1}' OR {0} IS NULL)", columnname, rightDT.ToString(CultureInfo.InvariantCulture));
                            }

                            break;
                        }
                        throw new FilterException("too many arguments");
                    }
                    if (i == 1)
                    {
                        if (filterParts.Length == 2)
                        {
                            //Von Operator
                            Token left = filterParts[0];
                            if (left.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(left.Text))
                                throw new FilterException("Argument must have value");

                            if (dataType == typeof(string))
                                return new FilterList<T> { { columnname, FilterOperator.Greater, left.Text } };
                            //filterString = String.Format("({0} > '{1}')", columnname, left.Text);
                            if (dataType == typeof(float))
                            {
                                float leftF;
                                if (!float.TryParse(left.Text, out leftF))
                                    throw new FilterException(String.Format(
                                        "left parameter has wrong format '{0}'", left.Text));
                                return new FilterList<T> { { columnname, FilterOperator.Greater, leftF } };
                                //filterString = String.Format("({0} > {1})", columnname, leftF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime leftDt = parseDateTime(left.Text);
                                return new FilterList<T> { { columnname, FilterOperator.Greater, leftDt } };
                                //filterString = String.Format("({0} > '{1}')", columnname, leftDT.ToString(CultureInfo.InvariantCulture));
                            }
                            break;
                        }
                        else
                        {
                            //BereichsOperator
                            Token left = filterParts[0];
                            if (left.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(left.Text))
                                throw new FilterException("parameter must have value");

                            Token right = filterParts[2];
                            if (right.TokenType != TokenType.Text)
                                throw new FilterException("TextToken expected");
                            if (String.IsNullOrEmpty(right.Text))
                                throw new FilterException("parameter must have value");

                            if (dataType == typeof(string))
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, left.Text},
                                    {columnname, FilterOperator.LessOrEqual, right.Text}
                                };
                            //filterString = String.Format("{0} >= '{1}' AND {0} <= '{2}'", columnname, left.Text, right.Text);
                            if (dataType == typeof(float))
                            {
                                float rightF;
                                if (!float.TryParse(right.Text, out rightF))
                                    throw new FilterException(
                                        String.Format("right parameter has wrong format '{0}'", right.Text));
                                float leftF;
                                if (!float.TryParse(left.Text, out leftF))
                                    throw new FilterException(String.Format(
                                        "left parameter has wrong format'{0}'", left.Text));
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, leftF},
                                    {columnname, FilterOperator.LessOrEqual, rightF}
                                };
                                //filterString = String.Format("{0} >= {1} AND {0} <= {2}", columnname, leftF.ToString(CultureInfo.InvariantCulture), leftF.ToString(CultureInfo.InvariantCulture));
                            }
                            if (dataType == typeof(DateTime))
                            {
                                DateTime rightDt = parseDateTime(right.Text);
                                DateTime leftDt = parseDateTime(left.Text); 
                                return new FilterList<T>
                                {
                                    {columnname, FilterOperator.GreaterOrEqual, leftDt},
                                    {columnname, FilterOperator.LessOrEqual, rightDt}
                                };
                                //filterString = String.Format("{0} >= '{1}' AND {0} <= '{2}'", columnname, leftDT.ToString(CultureInfo.InvariantCulture), rightDT.ToString(CultureInfo.InvariantCulture));
                            }

                            break;
                        }
                    }
                    throw new FilterException("unexpected parameter");
                }
                //Stringsuche Bauen
                if (currentToken.TokenType == TokenType.Wildcard)
                {
                    if (dataType != typeof(string))
                        throw new FilterException("Operator not allowed with this Data Type");
                    //Fehler wenn Datentyp kein string
                    sb.Append("%");
                }
                else if (currentToken.TokenType == TokenType.Text)
                    sb.Append(escape(currentToken.Text));
            }

            //Filterung auf Zeichenfolge
            string text = sb.ToString();
            if (dataType == typeof(string))
                return new FilterList<T> { { columnname, FilterOperator.Like, text } };
            //filterString = String.Format("{0} LIKE '{1}' ESCAPE '\\'", columnname, text);
            if (dataType == typeof(DateTime))
            {
                DateTime dt = parseDateTime(text);
                return new FilterList<T> { { columnname, FilterOperator.Equal, dt } };
                //filterString = String.Format("{0} = '{1}'", columnname, DT.ToString(CultureInfo.InvariantCulture));
            }
            if (dataType == typeof(float))
            {
                float f;
                if (!float.TryParse(text, out f))
                    throw new FilterException(String.Format("parameter has wrong format '{0}'", text));
                return new FilterList<T> { { columnname, FilterOperator.Equal, f } };
                //filterString = String.Format("{0} = {1}", columnname, F.ToString(CultureInfo.InvariantCulture));
            }

            return null;
        }

2 个答案:

答案 0 :(得分:4)

您需要找到基于Parsing Expression Grammars的C#代码生成器。它允许您定义语法,然后由生成器将其转换为代码。然后代码将能够解析符合您期望的语法的文本。

一个非常快速的google-fu表明peg-sharp可以正常工作。

为了学习使用PEG,您可以尝试the online version of PEG.js,它几​​乎可以在您最终使用的工作流程中运行:

  • 输入PEG声明(左侧窗口)
  • javascript解析器动态更新(右上方窗口)
  • 解析器解析您的输入并生成结果(右下窗口)

作为一个概念证明,这里是你的语法的一个暂定实现,你可以在PEG.js中复制粘贴(我想可以设法将它嵌入到stackoverflow小部件中):

以下是语法:

start
  = filters

filters
  = left:filter " " right:filters { return {filter: left, operation: "AND", filters: right};}
  / filter

filter
  = applicableRange:range {return {type: "range", range: applicableRange};}
 / openWord:wildcard  {return {type: "wildcard", word: openWord};}
 / simpleWord:word {return simpleWord;}
 / sentence:sentence {return sentence;}

sentence
 = "'" + letters:[0-9a-zA-Z *.]* "'" {return {type: "sentence", value: letters.join("")};}

word "aword"
  = letters:[0-9a-zA-Z]+ { return {type: "word", value: letters.join("")}; }

wildcard
  = 
 "*" word:word "*" {return {type: "wildcardBoth", value: word};}
/ "*" word:word {return {type: "wildcardStart", value: word};}
/ word:word "*" {return {type: "wildcardEnd", value: word};}

range "range"
  = left:word? ".." right:word? {return {from: left, to: right};}

基本上,语法允许您定义语言的构建块以及它们如何相互关联。例如,过滤器可以是范围,通配符,单词,句子或者根本没有(至少这是我在定义语法时所使用的;最后一个选项是结束过滤器中的递归)。

除了这些块,您还可以定义遇到这些块时的输出。在这种情况下,我输出一个JSON对象,表示应该发生什么样的过滤,以及过滤器将具有哪些参数。

如果使用以下输入测试语法:

'testing range' 123..456 123.. ..abc 'and testing wildcards' word1* *word2 *word3* cool heh

您将获得一个结构,该结构描述了应根据语法构建的过滤器:

{
   "filter": {
      "type": "sentence",
      "value": "testing range"
   },
   "operation": "AND",
   "filters": {
      "filter": {
         "type": "range",
         "range": {
            "from": {
               "type": "word",
               "value": "123"
            },
            "to": {
               "type": "word",
               "value": "456"
            }
         }
      },
      "operation": "AND",
      "filters": {
         "filter": {
            "type": "range",
            "range": {
               "from": {
                  "type": "word",
                  "value": "123"
               },
               "to": null
            }
         },
         "operation": "AND",
         "filters": {
            "filter": {
               "type": "range",
               "range": {
                  "from": null,
                  "to": {
                     "type": "word",
                     "value": "abc"
                  }
               }
            },
            "operation": "AND",
            "filters": {
               "filter": {
                  "type": "sentence",
                  "value": "and testing wildcards"
               },
               "operation": "AND",
               "filters": {
                  "filter": {
                     "type": "wildcard",
                     "word": {
                        "type": "wildcardEnd",
                        "value": {
                           "type": "word",
                           "value": "word1"
                        }
                     }
                  },
                  "operation": "AND",
                  "filters": {
                     "filter": {
                        "type": "wildcard",
                        "word": {
                           "type": "wildcardStart",
                           "value": {
                              "type": "word",
                              "value": "word2"
                           }
                        }
                     },
                     "operation": "AND",
                     "filters": {
                        "filter": {
                           "type": "wildcard",
                           "word": {
                              "type": "wildcardBoth",
                              "value": {
                                 "type": "word",
                                 "value": "word3"
                              }
                           }
                        },
                        "operation": "AND",
                        "filters": {
                           "filter": {
                              "type": "word",
                              "value": "cool"
                           },
                           "operation": "AND",
                           "filters": {
                              "type": "word",
                              "value": "heh"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

C#生成器的原理是相同的:将语法编译成一些能够解析输入的C#代码,并定义解析命中这个或那个块时会发生什么。

如果发生更改,您将需要重新编译语法(尽管它可以轻松地包含在您的构建步骤中),但您将能够生成表示已解析的过滤器的结构,并使用它来过滤搜索结果。

PEG的一个巨大优势是该格式是众所周知的,并且有很多在线学习它的来源,因此知识可以转移到其他语言/用途

答案 1 :(得分:1)

您可以使用Gold Parser创建语法树或任何其他方式来创建它。 这是链接http://goldparser.org/

除此之外,您还可以使用访问者设计模式生成过滤器列表。 https://en.wikipedia.org/wiki/Visitor_pattern

使用这两个,你可以做出一个非常可扩展的解决方案。