如何使用正则表达式在特定div和特定锚标记内获取img标记值

时间:2013-04-23 05:05:15

标签: php regex html-parsing

我是正则表达式的新手我为了在锚标记html中获取图像标记值而尝试了很多 这是我的html表达

<div class="smallSku" id="ctl00_ContentPlaceHolder1_smallImages">
                                <a title="" name="http://www.playg.in/productImages/med/PNC000051_PNC000051.jpg" href="http://www.playg.in/productImages/lrg/PNC000051_PNC000051.jpg" onclick="return showPic(this)" onmouseover="return showPic(this)">
    <img border="0" alt="" src="http://www.playg.in/productImages/thmb/PNC000051_PNC000051.jpg"></a>    <a title="PNC000051_PNC000051_1.jpg" name="http://www.playg.in/productImages/med/PNC000051_PNC000051_1.jpg" href="http://www.playg.in/productImages/lrg/PNC000051_PNC000051_1.jpg" onclick="return showPic(this)" onmouseover="return showPic(this)">
    <img border="0" alt="PNC000051_PNC000051_1.jpg" src="http://www.playg.in/productImages/thmb/PNC000051_PNC000051_1.jpg"></a>
                        </div>

我想只返回图片标签的src值,我尝试了“preg_match_all()”中的匹配模式,模式是

"@<div[\s\S]class="smallSku"[\s\S]id="ctl00_ContentPlaceHolder1_smallImages"\><a title=\"\" name="[\w\W]" href="[\w\W]" onclick=\"[\w\W]" onmouseover="[\w\W]"\><img[\s\S]src="(.*)"[\s\S]></a><\/div>@"

请帮助我尝试了很多时间,也尝试使用此链接Match image tag not nested in an anchor tag using regular expression

2 个答案:

答案 0 :(得分:5)

正则表达式不是解析HTML的正确工具。请参阅此常见问题解答:How to parse and process HTML/XML?

以下是如何使用您的示例获取src属性的示例:

$doc = new DOMDocument();
$doc->loadHTML($your_html_string);
$xpath = new DOMXPath($doc);

foreach ($xpath->query('//div[@class="smallSku"]/a/img/@src') as $attr) {
    $src = $attr->value;
    print $src;
}

答案 1 :(得分:2)

尝试这个太阳

    $content = file_get_contents('your url'); 
    preg_match_all("|<div class='items'>.*</div>|", $content, $arr, PREG_PATTERN_ORDER);  
preg_match_all("/src='([^']+)'/", $arr[0][0], $arrr, PREG_PATTERN_ORDER); 
    echo '<pre>'; 
    print_r($arrr);