我正在尝试将pascal字符串文字输入与以下模式匹配:@"^'([^']|(''))*'$"
,但这不起作用。模式有什么问题?
public void Run()
{
using(StreamReader reader = new StreamReader(String.Empty))
{
var LineNumber = 0;
var LineContent = String.Empty;
while(null != (LineContent = reader.ReadLine()))
{
LineNumber++;
String[] InputWords = new Regex(@"\(\*(?:\w|\d)*\*\)").Replace(LineContent.TrimStart(' '), @" ").Split(' ');
foreach(String word in InputWords)
{
Scanner.Scan(word);
}
}
}
}
我搜索任何pascal-comment条目的输入字符串,用空格替换它,然后我将输入拆分为子字符串以匹配以下内容:
private void Initialize()
{
MatchingTable = new Dictionary<TokenUnit.TokenType, Regex>();
MatchingTable[TokenUnit.TokenType.Identifier] = new Regex
(
@"^[_a-zA-Z]\w*$",
RegexOptions.Compiled | RegexOptions.Singleline
);
MatchingTable[TokenUnit.TokenType.NumberLiteral] = new Regex
(
@"(?:^\d+$)|(?:^\d+\.\d*$)|(?:^\d*\.\d+$)",
RegexOptions.Compiled | RegexOptions.Singleline
);
}
// ... Here it all comes together
public TokenUnit Scan(String input)
{
foreach(KeyValuePair<TokenUnit.TokenType, Regex> node in this.MatchingTable)
{
if(node.Value.IsMatch(input))
{
return new TokenUnit
{
Type = node.Key
};
}
}
return new TokenUnit
{
Type = TokenUnit.TokenType.Unsupported
};
}
答案 0 :(得分:1)
该模式似乎是正确的,尽管可以简化:
^'(?:[^']+|'')*'$
<强>解释强>
^ # Match start of string
' # Match the opening quote
(?: # Match either...
[^']+ # one or more characters except the quote character
| # or
'' # two quote characters (= escaped quote)
)* # any number of times
' # Then match the closing quote
$ # Match end of string
如果您正在检查的输入包含除Pascal字符串之外的任何内容(例如,周围的空格),则此正则表达式将失败。
因此,如果您想使用正则表达式在较大的文本语料库中查找Pascal字符串,则需要删除^
和$
锚点。
如果你想允许双引号,那么你需要增加正则表达式:
^(?:'(?:[^']+|'')*'|"(?:[^"]+|"")*")$
在C#中:
foundMatch = Regex.IsMatch(subjectString, "^(?:'(?:[^']+|'')*'|\"(?:[^\"]+|\"\")*\")$");
此正则表达式将匹配
之类的字符串'This matches.'
'This too, even though it ''contains quotes''.'
"Mixed quotes aren't a problem."
''
它不匹配
之类的字符串'The quotes aren't balanced or escaped.'
There is something 'before or after' the quotes.
"Even whitespace is a problem."