类似pascal的字符串文字正则表达式

时间:2010-12-20 15:41:57

标签: c# regex

我正在尝试将pascal字符串文字输入与以下模式匹配:@"^'([^']|(''))*'$",但这不起作用。模式有什么问题?

public void Run()
{             
    using(StreamReader reader = new StreamReader(String.Empty))
    {
        var LineNumber = 0;
        var LineContent = String.Empty;

        while(null != (LineContent = reader.ReadLine()))
        {
            LineNumber++;

            String[] InputWords = new Regex(@"\(\*(?:\w|\d)*\*\)").Replace(LineContent.TrimStart(' '), @" ").Split(' ');

            foreach(String word in InputWords)
            {
                Scanner.Scan(word);
            }

        }
    }
}

我搜索任何pascal-comment条目的输入字符串,用空格替换它,然后我将输入拆分为子字符串以匹配以下内容:

private void Initialize()
{
    MatchingTable = new Dictionary<TokenUnit.TokenType, Regex>();

    MatchingTable[TokenUnit.TokenType.Identifier] = new Regex
    (
        @"^[_a-zA-Z]\w*$",
        RegexOptions.Compiled | RegexOptions.Singleline
    );
    MatchingTable[TokenUnit.TokenType.NumberLiteral] = new Regex
    (
        @"(?:^\d+$)|(?:^\d+\.\d*$)|(?:^\d*\.\d+$)",
         RegexOptions.Compiled | RegexOptions.Singleline
    );
}
// ... Here it all comes together
public TokenUnit Scan(String input)
{                         
    foreach(KeyValuePair<TokenUnit.TokenType, Regex> node in this.MatchingTable)
    {
        if(node.Value.IsMatch(input))
        {
            return new TokenUnit
            {
                Type = node.Key                        
            };
        }
    }
    return new TokenUnit
    {
        Type = TokenUnit.TokenType.Unsupported
    };
}

1 个答案:

答案 0 :(得分:1)

该模式似乎是正确的,尽管可以简化:

^'(?:[^']+|'')*'$

<强>解释

^      # Match start of string
'      # Match the opening quote
(?:    # Match either...
 [^']+ # one or more characters except the quote character
 |     # or
 ''    # two quote characters (= escaped quote)
)*     # any number of times
'      # Then match the closing quote
$      # Match end of string

如果您正在检查的输入包含除Pascal字符串之外的任何内容(例如,周围的空格),则此正则表达式将失败。

因此,如果您想使用正则表达式在较大的文本语料库中查找Pascal字符串,则需要删除^$锚点。

如果你想允许双引号,那么你需要增加正则表达式:

^(?:'(?:[^']+|'')*'|"(?:[^"]+|"")*")$

在C#中:

foundMatch = Regex.IsMatch(subjectString, "^(?:'(?:[^']+|'')*'|\"(?:[^\"]+|\"\")*\")$");

此正则表达式将匹配

之类的字符串
'This matches.'
'This too, even though it ''contains quotes''.'
"Mixed quotes aren't a problem."
''

它不匹配

之类的字符串
'The quotes aren't balanced or escaped.'
There is something 'before or after' the quotes.
    "Even whitespace is a problem."