我试图匹配一个用数字,数字或罗马数字书写的数字。这是一堆样本
CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO
我在正则表达式上非常糟糕,这是我到目前为止所得到的。
(CHAPTER (([0-9]+)|(/* words - see below */)|( /* roman - see below */)))
// words
(TWENTY|THIRTY|etc)?( |-)?(ONE|TWO|THREE|FOUR|FIVE|etc)?
// roman
(I|II|III|IV|V|etc)+
该声明涉及第1章,第2章和第3章,但试图将IV作为一个单词进行匹配(我猜测它的匹配FIVE不知何故?)。二十二根本不匹配。
有人可以帮忙吗?这是完整的正则表达式
(CHAPTER (
([0-9]+)|
((TWENTY|THIRTY)?( |-)?(ONE|TWO|THREE|FOUR|FIVE)?)|
((I|II|III|IV|V)+)
))
注:
这一点是将这些文本表示转换为实际整数。我有方法在每种情况下都这样做,所以我做需要区分各种情况
答案 0 :(得分:1)
因为你已经有了解析器,如果给出一些表面上看起来像有效的罗马/文本输入的东西,希望它会优雅地失败,你可以直接调用它们,看看哪个传递。
如果你不想只调用它们,那么这个正则表达式应该确定将每个输入传递给哪个解析器。
var re = new Regex(
@"CHAPTER (?:(?<arabic>\d+)|(?<roman>[IVXLCDM]+)|(?<text>[A-Z ]+))");
称为例如
var input = @"CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO";
foreach (Match match in re.Matches(input))
{
if (match.Groups["arabic"].Success)
{
Console.WriteLine("Pass {0} to Arabic parser", match.Groups["arabic"].Value);
}
else if (match.Groups["roman"].Success)
{
Console.WriteLine("Pass {0} to Roman parser", match.Groups["roman"].Value);
}
else if (match.Groups["text"].Success)
{
Console.WriteLine("Pass {0} to Text parser", match.Groups["text"].Value);
}
}
结果
Pass 1 to Arabic parser
Pass 2 to Arabic parser
Pass THREE to Text parser
Pass IV to Roman parser
Pass TWENTY TWO to Text parser
答案 1 :(得分:1)
罗马数字的正则表达式为:\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b
数字的正则表达式:\d+
文字的正则表达式:[a-z ]+
将所有这些结合在一起:
CHAPTER (?:(?<digits>\d+)|(?<roman>\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b)|(?<literal>[A-Z ]+))
答案 2 :(得分:0)
CHAPTER (?:\d+|(?:XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I)|(?:(?P<d>TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY)?(?(d)(?: (?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE))?|(?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN))))
细分和解释:
CHAPTER // match "CHAPTER " literally
(?:// then either:
\d+// 1: digits
|
(?:// or 2: roman numerals (up to 18) (note: make sure to order them by length!)
XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I
)
|// or 3: words
(?:
(?P<d>// first, one of the literals "TWENTY", "THIRTY", etc...
TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY
)?// ...if possible
(?(d) // then, if the previous group matched...
(?: // ...a space...
(?:// ...and the numbers "ONE" to "NINE"
ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE
)
)?// ...if possible.
|
(?://otherwise, one of "ONE" to "NINETEEN"
ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN
)
)
)
)