我有一个正则表达式,用于查找以ngrams选项组结尾的模式。这是正则表达式:
$regex = '/.{0,150}\b(is (.{0,50}?)\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))\b([^.!?<>]{0,150})\b/'
这是我匹配的字符串:
$string = 'ABC Company Inc. is a Distributor, Fabricator, and Manufacturer of textiles. Another sentence.';
目标是使用正则表达式的第一个捕获组提取“是分销商,制造商和制造商”。正则表达式的其余部分仅用于定义上下文,理想情况下,通常在句子结尾处或在一定长度之后结束。
现在,我的第一个捕获组非常渴望并且仅匹配“是分销商”。我怎么能让这不热切呢?
答案 0 :(得分:1)
.{0,150}\b(is (.{0,50}?)\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency)(.*?\b(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))*)\b([^.!?<>]{0,150})\b
这个超长的正则表达式可以做到这一点。参见演示。
答案 1 :(得分:1)
没有重复的更短的版本(不在代码标签中,因为单行不可读):
.{0,150}\b(is([^.!?<>]{0,50}(assembler|builder|consulter|contracter|contractor|contract manufacturer|converter|designer|distributer|distributor|engineerer|fabricater|fabricator|formulater|formulator|installer|machiner|manufacturer|offerer|producer|provider|reseller|seller|supplier|wholesaler|machine shop|job shop|law firm|marketer|marketing agency))+)\b([^.!?<>]{0,150}\b)
这个想法是允许前缀不超过50个符号(幸运的是,只有一个这样的常量,因此很容易找到它)在每个关键字之前,无论它是枚举中的另一个关键字。为了捕获枚举,我在关键字列表后添加了+)
。
检查here。