正则表达式匹配写为单词,数字或罗马数字的数字

时间:2014-09-19 21:51:23

标签: c# regex numbers words roman-numerals

我试图匹配一个用数字,数字或罗马数字书写的数字。这是一堆样本

CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO

我在正则表达式上非常糟糕,这是我到目前为止所得到的。

(CHAPTER (([0-9]+)|(/* words - see below */)|( /* roman - see below */)))

// words
(TWENTY|THIRTY|etc)?( |-)?(ONE|TWO|THREE|FOUR|FIVE|etc)?

// roman
(I|II|III|IV|V|etc)+

该声明涉及第1章,第2章和第3章,但试图将IV作为一个单词进行匹配(我猜测它的匹配FIVE不知何故?)。二十二根本不匹配。

有人可以帮忙吗?这是完整的正则表达式

(CHAPTER (
([0-9]+)|
((TWENTY|THIRTY)?( |-)?(ONE|TWO|THREE|FOUR|FIVE)?)|
((I|II|III|IV|V)+)
))

注:

这一点是将这些文本表示转换为实际整数。我有方法在每种情况下都这样做,所以我需要区分各种情况

3 个答案:

答案 0 :(得分:1)

因为你已经有了解析器,如果给出一些表面上看起来像有效的罗马/文本输入的东西,希望它会优雅地失败,你可以直接调用它们,看看哪个传递。

如果你不想只调用它们,那么这个正则表达式应该确定将每个输入传递给哪个解析器。

var re = new Regex(
    @"CHAPTER (?:(?<arabic>\d+)|(?<roman>[IVXLCDM]+)|(?<text>[A-Z ]+))");

称为例如

var input = @"CHAPTER 1
CHAPTER 2
CHAPTER THREE
CHAPTER IV
CHAPTER TWENTY TWO";

foreach (Match match in re.Matches(input))
{
    if (match.Groups["arabic"].Success)
    {
        Console.WriteLine("Pass {0} to Arabic parser", match.Groups["arabic"].Value);
    }
    else if (match.Groups["roman"].Success)
    {
        Console.WriteLine("Pass {0} to Roman parser", match.Groups["roman"].Value);
    }
    else if (match.Groups["text"].Success)
    {
        Console.WriteLine("Pass {0} to Text parser", match.Groups["text"].Value);
    }
}

结果

Pass 1 to Arabic parser
Pass 2 to Arabic parser
Pass THREE to Text parser
Pass IV to Roman parser
Pass TWENTY TWO to Text parser

答案 1 :(得分:1)

罗马数字的正则表达式为:\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b
数字的正则表达式:\d+
文字的正则表达式:[a-z ]+

将所有这些结合在一起:

CHAPTER (?:(?<digits>\d+)|(?<roman>\bM{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\b)|(?<literal>[A-Z ]+))

答案 2 :(得分:0)

CHAPTER (?:\d+|(?:XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I)|(?:(?P<d>TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY)?(?(d)(?: (?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE))?|(?:ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN))))

细分和解释:

CHAPTER // match "CHAPTER " literally
    (?:// then either:
        \d+// 1: digits
        |
        (?:// or 2: roman numerals (up to 18) (note: make sure to order them by length!)
            XVIII|XVII|XIII|VIII|XIV|XVI|XII|III|VII|XV|VI|IV|XI|IX|XX|III|II|X|V|I
        )
        |// or 3: words
        (?:
            (?P<d>// first, one of the literals "TWENTY", "THIRTY", etc...
                TWENTY|THIRTY|FORTY|FIFTY|SIXTY|SEVENTY|EIGHTY|NINETY
            )?// ...if possible
            (?(d) // then, if the previous group matched...
                (?: // ...a space...
                    (?:// ...and the numbers "ONE" to "NINE"
                        ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE
                    )
                )?// ...if possible.
                |
                (?://otherwise, one of "ONE" to "NINETEEN"
                    ONE|TWO|THREE|FOUR|FIVE|SIX|SEVEN|EIGHT|NINE|TEN|ELEVEN|TWELVE|THIRTEEN|FOURTEEN|FIFTEEN|SIXTEEN|SEVENTEEN|EIGTHEEN|NINETEEN
                )
            )
        )
    )

Demo.