Question

请使用以下字符串：“互联网上的营销和板球”。

我想使用正则表达式找到“Ma”-any text-“et”的所有可能匹配项。所以..

市场
市场营销和板球
互联网上的营销和板球

正则表达式Ma.*et返回“互联网上的营销和板球”。正则表达式Ma.*?et返回市场。但是我想要一个能够返回所有3的正则表达式。这可能吗？

感谢。

Answer 1

据我所知：不。

但是你可以先匹配非贪婪，然后用量词生成一个新的正则表达式来获得第二个匹配。像这样：

Ma.*?et
Ma.{3,}?et

......等等......

Answer 2

谢谢你们，这真的很有帮助。以下是我为PHP提出的建议：

function preg_match_ubergreedy($regex,$text) {

    for($i=0;$i<strlen($text);$i++) {
        $exp = str_replace("*","{".$i."}",$regex);
        preg_match($exp,$text,$matches);
        if($matches[0]) {
            $matched[] = $matches[0];
        }
    }

    return $matched;

}
$text = "Marketing and Cricket on the Internet";
$matches = preg_match_ubergreedy("@Ma.*?et@is",$text);

Answer 3

可悲的是，这不可能与标准POSIX正则表达式相提并论，后者返回单个（最佳候选者，每个正则表达式规则）匹配。您将需要使用扩展功能，该功能可能存在于您使用此正则表达式的特定编程语言中，假设您在程序中使用它，以完成此任务。

Answer 4

对于更通用的正则表达式，另一个选项是递归地匹配贪婪的正则表达式与前一个匹配，依次丢弃第一个和最后一个字符，以确保您只匹配上一个匹配的子字符串。匹配Marketing and Cricket on the Internet后，我们会针对子匹配测试arketing and Cricket on the Internet和Marketing and Cricket on the Interne。

在C＃中就是这样......

public static IEnumerable<Match> SubMatches(Regex r, string input)
{
    var result = new List<Match>();

    var matches = r.Matches(input);
    foreach (Match m in matches)
    {
        result.Add(m);

        if (m.Value.Length > 1)
        {
            string prefix = m.Value.Substring(0, m.Value.Length - 1);
            result.AddRange(SubMatches(r, prefix));

            string suffix = m.Value.Substring(1);
            result.AddRange(SubMatches(r, suffix));
        }

    }

    return result;
}

但是，此版本最终可能会多次返回相同的子匹配，例如，它会在Marmoset中找到Marketing and Marmosets on the Internet两次，首先作为Marketing and Marmosets on the Internet的子匹配，然后作为Marmosets on the Internet的子匹配。

用正则表达式查找所有匹配 - 贪婪和非贪婪！

4 个答案: