我为持续时间
写了一个正则表达式正则表达式
([0-9]+ (?:[y|Y]ears?|[y|Y]rs?|[m|M]o?nths?|[d|D]a?ys?) ?)+
您可以在this regex tool上查看。
匹配的测试用例
应匹配但不匹配的测试用例
质疑
已编辑1
我添加了reFourDigits
varibale来处理Twelve hundred twenty
类型的案例。但它没有抓住这一点。请帮助我。以下是有关上述问题的所有细节。
public static final String reDigit = "(?:[O|o]ne|[t|T]wo|[t|T]hree|[f|F]our|[f|F]ive|[s|S]ix|[s|S]even|[e|E]ight|[n|N]ine)";
public static final String reTeen = "(?:[t|T]wenty|[t|T]hirty|[f|F]orty|[f|F]ifty|[s|S]ixty|[s|S]eventy|[e|E]ighty|[n|N]inety)";
public static final String re10_19 = "(?:[t|T]en|[e|E]leven|[t|T]welve|[t|T]hirteen|[f|F]ourteen|[f|F]ifteen|[s|S]ixteen|[s|S]eventeen|[e|E]ighteen|[n|N]ineteen)";
public static final String reTwoDigits = "(?:(?:" + reTeen + "[- ])?" + reDigit + "|" + re10_19 + "|" + reTeen + ")";
public static final String reThreeDigits = "(?:(?:" + reDigit + " hundred (?:and)?)?" + reTwoDigits + "|" + reDigit + " hundred)";
public static final String reFourDigits = "(?:" + reTwoDigits + " hundred (?:and)? " + reTwoDigits + ")";
public static final String reSixDigits = "(?:(?:" + reThreeDigits + " thousand (?:and )?)?" + reThreeDigits + "|" + reThreeDigits + " thousand|" + reFourDigits + ")";
public static final String reTwelveDigits = "(?:(?:" + reSixDigits + " million (?:and )?)?" + reSixDigits + "|" + reSixDigits + " million)";
模式
String patternString = "\\b( ?(?:[,0-9]+|"+Constants.reTwelveDigits+") ?)\\b";
当我运行There are twenty hundred twenty two apples
时。它找到两个字符串twenty
和twenty two
,而不是twenty hundred twenty two
。
答案 0 :(得分:3)
就个人而言,我会推荐一个真正的解析器。正则表达式是可能的,但它可以成为一个非常长的模式。下面我使用了正则表达式的PHP方言中的define来避免重复的模式。如果您选择的正则表达式引擎没有这样的构造,那么您可能需要扩展每个定义,这会产生相当长的模式。您仍然可以通过使用简单的字符串连接动态构建模式字符串来避免自己编写它。
(?(DEFINE)(?<Digit>one|two|three|four|five|six|seven|eight|nine))
(?(DEFINE)(?<Teen>twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety))
(?(DEFINE)(?<TwoDigits>((?&Teen)-)?(?&Digit)|ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|(?&Teen)))
(?(DEFINE)(?<ThreeDigits>((?&Digit) hundred (and )?)?(?&TwoDigits)|(?&Digit) hundred))
(?(DEFINE)(?<SixDigits>((?&ThreeDigits) thousand (and )?)?(?&ThreeDigits)|(?&ThreeDigits) thousand))
(?(DEFINE)(?<TwelveDigits>((?&SixDigits) million (and )?)?(?&SixDigits)|(?&SixDigits) million))
小提琴:http://regex101.com/r/oM4oF2
将定义添加到表达式中,
然后,您可以按[0-9]+
替换每个(?:[0-9]+|(?&TwelveDigits))
。
修改强> 据我所知,Java没有可重用的子模式,所以你必须完全扩展模式。
string reDigit = "(?:one|two|three|four|five|six|seven|eight|nine)";
string reTeen = "(?:twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety)";
string reTwoDigits = "(?:(?:" + reTeen + "-)?" + reDigit + "|ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|" + reTeen + ")";
string reThreeDigits = "(?:(?:" + reDigit + " hundred (?:and )?)?" + reTwoDigits + "|" + reDigit + " hundred)";
string reSixDigits = "(?:(?:" + reThreeDigits + " thousand (?:and )?)?" + reThreeDigits + "|" + reThreeDigits + " thousand)";
string reTwelveDigits = "(?:(?:" + reSixDigits + " million (?:and )?)?" + reSixDigits + "|" + reSixDigits + " million)";
string reNumeric = "\\b(?:[0-9]+|" + reTwelveDigits + ")\\b";
我找不到Java小提琴网站,所以我使用的是JavaScript,它有一个类似的正则表达式引擎:http://jsfiddle.net/f6RmN/