Question

我正在尝试从HTML字符串中提取所有 img 标记。见代码

$d1     = file_get_contents("http://itcapsule.blogspot.com/feeds/posts/default?alt=rss");
preg_match_all('/<img[^>]+>/i',$d1,$result);
print_r($result);

结果是

Array ( [0] => Array ( ) )

但同样的正则表达式在在线正则表达式测试工具http://regex.larsolavtorvik.com/中给出了正确的结果。

可能是什么问题？

Answer 1

Do not use regular expressions to process html, use a parser instead.

Answer 2

您正在解析的内容使用html实体进行编码 - 基本上<将替换为<。首先使用html_entity_decode将数据转换为普通的html。

PS：使用HTML解析器而不是正则表达式。

Answer 3

使用SimplePie XML Parser

解决了这个问题

include_once 'simplepie.inc';

$feed   = "feedurl";

$data       =   new SimplePie($feed);
$data->init();
$data->handle_content_type();

foreach ($data->get_items() as $item)
{
    $desc=$item->get_description();
    preg_match_all('/<img[^>]+>/i',$desc,$result);
    print_r($result);
}

这就是我在寻找的东西:)谢谢你们！

PHP preg_match_all什么都不返回

3 个答案: