Question

我正在尝试使用Regex.Matches，它似乎与我习惯使用其他语言（如PHP）的方式不同。这是我想要做的：

我希望从特定网页获取所有表单，但是当我尝试执行以下操作时

        String pattern = "(?i)<form[^<>]*>(.*)<\\/form>"; 
        MatchCollection matches = Regex.Matches(content, pattern );

        foreach (Match myMatch in matches)
        {
            MessageBox.Show(myMatch.Result("$1"));
        }

即使该页面上有三个表单，此代码也不显示任何内容。似乎当我使用（。*）时它只是跳过内容的所有内容。

Answer 1

Regex类默认使.运算符 NOT 匹配\ r和\ n。尝试替换它：

MatchCollection matches = Regex.Matches(content, pattern );

使用：

MatchCollection matches = Regex.Matches(content, pattern, RegexOptions.Singleline);

Answer 2

在正则表达式的主要部分尝试类似的内容：

    String pattern = "<form[\\d\\D]*?</form>";

这是我目前用来从文档中删除特定类型的所有标记的模式，但应该很好地找到表单标记。如果需要，您可以更改\ d \ D部分。

Answer 3

string pattern = @"(?is)<form[^<>]*>(.*?)</form>";

正则表达式在PHP和C＃中应该是一样的（或者更确切地说，PCRE和.NET）。如果您在PHP 中没有 ?获得最小匹配，则可能设置了/U（“ungreedy”）选项，例如：

preg_match_all('~<form[^<>]*>(.*)</form>~isU', $subject, $matches);

或

preg_match_all('~(?isU)<form[^<>]*>(.*)</form>~', $subject, $matches);

.NET与PCRE的ungreedy模式无关。

c＃Regex.Matches多个匹配结果的问题

3 个答案: