从HTML中提取链接

时间:2010-11-26 15:43:00

标签: php dom

    <?php
    $cont = '<div class="video-image">
        <a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="23">
            <img src="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt="TI" width="130" height="78"/>
        </a>
        <span class="video-title"><a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="sdg">No Matter What</a></span>
        <span class="video-artist"><a href="video/TI - No+Matter+What/_CHheEQe9M8/" title="ss" class="ellipsis">TI</a></span>
    </div>';

    if (preg_match_all('#<a href="([^>]*)"#iU', $cont, $arr))
    {
        foreach ($arr[1] as $value)
        {
            var_dump($value);
            $cont = preg_replace('#' . preg_quote($value, '#') . '#iU', 'http://site.com/' . $value, $cont);
        }
    }

    echo $cont;

退回:http://site.com/http://site.com/http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/

为什么呢?我想:http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/ 怎么做?抱歉英文不好

修改

$dom = new DOMDocument;
    $dom->loadHTML($cont);
    foreach( $dom->getElementsByTagName('a') as $node )
    {
        $cont = preg_replace('#' . preg_quote($node->getAttribute('href'), '#') . '#', "http://site.com/" . $node->getAttribute('href'), $cont);
    }    
    echo $cont;

此代码也会返回http://site.com/http://site.com/http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/ ...

2 个答案:

答案 0 :(得分:2)

$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXml($xhtml);
foreach( $dom->getElementsByTagName('a') as $node )
{
    $node->setAttribute(
        'href', 
        "http://site.com/" . $node->getAttribute('href')
    );
}
$dom->formatOutput = TRUE;
echo $dom->saveXML($dom->documentElement);

结果:

<div class="video-image">
  <a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="23">
    <img src="http://i.ytimg.com/vi/_CHheEQe9M8/3.jpg" alt="TI" width="130" height="78"/>
  </a>
  <span class="video-title">
    <a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="sdg">No Matter What</a>
  </span>
  <span class="video-artist">
    <a href="http://site.com/video/TI - No+Matter+What/_CHheEQe9M8/" title="ss" class="ellipsis">TI</a>
  </span>
</div>

答案 1 :(得分:0)

更改^&gt;以^“开头,因为它选择的时候就像我测试时一样。