我正在尝试将现有的PHP正则表达式转换为适用于稍微不同的文档样式。
这是文档的原始样式:
**FOODS - TYPE A**
___________________________________
**PRODUCT**
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;
2) La Fe String Cheese
**CODE**
Sell by date going back to February 1, 2009
成功运行的PHP Regex匹配代码,如果该行被星号包围,则仅返回“true”,并将“ - ”的每一侧分别存储为$ m [1]和$ m [2]。
if ( preg_match('#^\*\*([^-]+)(?:-(.*))?\*\*$#', $line, $m) ) {
// only for **header - subheader** $m[2] is set.
if ( isset($m[2]) ) {
return array(TYPE_HEADER, array(trim($m[1]), trim($m[2])));
}
else {
return array(TYPE_KEY, array($m[1]));
}
}
因此,对于第1行:$ m [1] =“FOODS”和$ m [2] =“TYPE A”; 第2行将被跳过;第3行:$ m [1] =“PRODUCT”等。
问题:如果标题没有有星号,我将如何重写上述正则表达式匹配,但仍然是全部大写,并且至少是4个字符长?例如:
FOODS - TYPE A
___________________________________
PRODUCT
1) Mi Pueblito Queso Fresco Authentic Mexican Style Fresh Cheese;
2) La Fe String Cheese
CODE
Sell by date going back to February 1, 2009
谢谢。
答案 0 :(得分:2)
沿着(不要忘记Unicode正则表达式的“u”标志):
^(?:\*\*)?(?=[^*]{4,})(\p{Lu}+)(?:\s*-\s*(\p{Lu}+))?(?:\*\*)?\s*$
^ # start of line (?:\*\*)? # two stars, optional (?=[^*]{4,}) # followed by at least 4 non-star characters (\p{Lu}+) # group 1, Unicode upper case letters (?: # start no capture group \s*-\s* # space*, dash, space* (\p{Lu}+) # group 2, Inicode upper case letters )? # end no capture group, make optional (?:\*\*)? # two stars, optional \s* # optional trailing spaces $ # end of line
编辑:简化,根据评论:
^(?=[A-Z ]{4,})([A-Z ]+)(?:-([A-Z ]+))?\s*$
^ # start of line (?=[A-Z -]{4,}) # followed by at least 4 upper case characters, spaces or dashes ([A-Z ]+) # group 1, upper case letters or space (?: # start no capture group - # a dash ([A-Z ]+) # group 2, upper case letters or space )? # end no capture group, make optional \s* # optional trailing spaces $ # end of line
第1组和第2组的内容必须在使用前进行修剪。
答案 1 :(得分:1)
^([A-Z]{4,}(?:[A-Z ]*[A-Z])?)(?:\s*-\s*([A-Z]{4,}(?:[A-Z ]*)?))?$
这个怎么样? 它将匹配至少4个字符的大写单词和一个至少4个大写字母的可选子标题。
答案 2 :(得分:1)
正则表达式:
^(?=.{4})([^-]+)(?:-(.*))?$
解释:
^ # start of line (?=.{4}) # look ahead to make sure there are at least 4 characters ([^-]+) # get all characters until it finds a dash character, if there is any (?:-(.*))? # optional: skip the dash and continue get all characters until EOL $ # end of line
我认为你只对至少有4个字符的行感兴趣。
另外,我作弊了一点,因此正则表达式将匹配任何字符,而不仅仅是英文大写字母,因为它会导致更简单的表达。无论如何,如果你想确保它只接受大写字母,这应该这样做:
^(?=.{4})([A-Z\s]+)(?:-([A-Z\s]+))?$
答案 3 :(得分:0)
所以你需要知道的是标题以四个大写的ASCII字母开头?这应该有效:
'#^([A-Z]{4}[^-]*)(?:-(.*))?$#'