Question

我正在尝试在图片的帖子中获取第一个href标记的<a>属性的值。
这就是我到目前为止所做的：

$pattern = "/<a.+href=('|\")(.*?).(bmp|gif|jpeg|jpg|png)('|\").*>/i";
$output = preg_match_all($pattern, $post->post_content, $matches);
$first_link = $matches[1][0];

但是，不有效。

我有一个代码来获取做的src标记的<img>值：

$pattern = "/<img.+src=[\'"]([^\'"]+)[\'"].*>/i";
$output = preg_match_all($pattern, $post->post_content, $matches);
$first_img = $matches[1][0];

由于我不是正则表达式和php的专家，所以我不知道我做错了什么。

此外，我找不到任何体面的，有组织的正则表达式指南，所以链接到一个也可能有用！

Answer 1

这不是您应该使用正则表达式解决的问题。如果你想解析HTML，你需要的是一个HTML解析器，PHP已经为你提供了一个非常好用的解析器！

$html = <<<HTML
<a href="http://somesillyexample.com/some/silly/path/to/a/file.jpeg">
HTML;

$dom = new DomDocument;
$dom->loadHTML($html); // load HTML from a string
$elements = $dom->getElementsByTagName('a'); // get all elements with an 'a' tag in the DOM
foreach ($elements as $node) {
    /* If the element has an href attribute let's get it */
    if ($node->hasAttribute('href')) {
        echo $node->getAttribute('href') . "\n";
    }
}
/*
will output:

http://somesillyexample.com/some/silly/path/to/a/file.jpeg
*/

有关详细信息，请参阅DOMDocument文档。

Answer 2

您应该使用DOM解析器。如果您可以使用第三方库，请查看this one。它使您的任务非常简单：

$html = new simple_html_dom();
$html->load($post->post_content);

$anchor = $html->find('a', 0);
$first_link = $anchor->href;

如果因某种原因无法使用此库，使用PHP's built-in DOM module仍然是比正则表达式更好的选择。

Answer 3

关于你的正则表达式的一些注释：

 "/<a.+href=('|\")(.*?).(bmp|gif|jpeg|jpg|png)('|\").*>/i"
      ^ that's greedy, should be +?
      ^ that's any char, should be not-closing-tag character: [^>]

 "/<a.+href=('|\")(.*?).(bmp|gif|jpeg|jpg|png)('|\").*>/i"
            ^^^^^^ for readability use ['\"]

 "/<a.+href=('|\")(.*?).(bmp|gif|jpeg|jpg|png)('|\").*>/i"
                       ^ that's any char, you might wanted \.

 "/<a.+href=('|\")(.*?).(bmp|gif|jpeg|jpg|png)('|\").*>/i"
                    ^^ that's ungreedy (good!)       ^ see above (greedy any char)

我现在无法测试，因为我这里没有PHP，但是纠正这些问题，也许你的问题已经解决了。另请查看pattern modifier /U切换默认的“贪婪”。

然而，这个问题已经解决了很多次，因此您应该使用现有的解决方案（DOM解析器）。例如，你不允许在href中引用（这对于href来说可能没问题，但是稍后你会复制+粘贴你的正则表达式来解析引号是有效字符的另一个html属性）。

正则表达式+ preg_match_all - 获取属性的值

3 个答案: