Question

这是我有多远。这是有效的：

$urls = $this->match_all('/<a href="(http:\/\/www.imdb.de\/title\/tt.*?)".*?>.*?<\/a>/ms',
            $content, 1);

现在我不想对不同的网站做同样的事情。但该网站的链接有不同的结构： http://www.example.org/ANYTHING

我不知道自己做错了什么，但是对于其他网站（example.org），它无效。

这是我试过的

$urls = $this->match_all('/<a href="(http:\/\/www.example.org\/.*?)".*?>.*?<\/a>/ms',
    $content, 1);

感谢您的帮助。 Stackoverflow太棒了！

Answer 1

ANYTHING通常由.*?（您已在原始正则表达式中使用）表示。您也可以使用[^"]+作为占位符。

Answer 2

听起来你想要以下正则表达式：

'/<a href="(http:\/\/example\.org\/.*?)".*?>.*?<\/a>/ms'

您还可以使用不同的分隔符来避免转义反斜杠：

'#<a href="(http://example\.org/.*?)".*?>.*?</a>#ms'

请注意域名中.的转义，因为您打算匹配文字.，而不是任何字符。

Answer 3

我认为这应该有帮助

/<a href="(http:\/\/www.example.org\/.*?)".*?>.*?<\/a>/ms
<a href="http://www.example.org/ANYTHING">text</a>

结果：

Array
(
    [0] => <a href="http://www.example.org/ANYTHING">text</a>
    [1] => http://www.example.org/ANYTHING
)

编辑：我总觉得这个网站非常有用，我想试试preg_match - http://www.solmetra.com/scripts/regex/index.php