如何使PHP Regex Option-Group不急?

时间:2015-04-01 18:44:03

标签: php regex nlp

我有一个正则表达式,用于查找以ngrams选项组结尾的模式。这是正则表达式:

$regex = '/.{0,150}\b(is (.{0,50}?)\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))\b([^.!?<>]{0,150})\b/'

这是我匹配的字符串:

$string = 'ABC Company Inc. is a Distributor, Fabricator, and Manufacturer of textiles. Another sentence.';

目标是使用正则表达式的第一个捕获组提取“是分销商,制造商和制造商”。正则表达式的其余部分仅用于定义上下文,理想情况下,通常在句子结尾处或在一定长度之后结束。

现在,我的第一个捕获组非常渴望并且仅匹配“是分销商”。我怎么能让这不热切呢?

2 个答案:

答案 0 :(得分:1)

.{0,150}\b(is (.{0,50}?)\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency)(.*?\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))*)\b([^.!?<>]{0,150})\b

这个超长的正则表达式可以做到这一点。参见演示。

https://regex101.com/r/sJ9gM7/39

答案 1 :(得分:1)

没有重复的更短的版本(不在代码标签中,因为单行不可读):

.{0,150}\b(is([^.!?<>]{0,50}(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))+)\b([^.!?<>]{0,150}\b)

这个想法是允许前缀不超过50个符号(幸运的是,只有一个这样的常量,因此很容易找到它)在每个关键字之前,无论它是枚举中的另一个关键字。为了捕获枚举,我在关键字列表后添加了+)

检查here