用PHP获取html标签

时间:2013-09-03 13:49:59

标签: php html

我有:

$body = '

<div id="one">
    <div id="two">
        <div class="sub">
            <span class="text"><a class="here" href="/aaa.php">ttt</a></span>
        </div>
        <span class="f">aa</span>
        <div class="sub2">
            <a class="here" href="/bbb.php">ttt</a>
            <div>
                <a class="here" href="/ttt.php">ttt</a>
            </div>
            <a class="here" href="/ddd.php">ttt</a>
        </div>
        <div class="sub">
            <a class="here" href="/zzz.php">ttt</a>
        </div>
    </div>
</div>

';

我如何从标签&#34; a&#34;中获取所有href。到阵列? 我想收到:

Array
(
    [0] => /aaa.php
    [1] => /bbb.php
    [2] => /ttt.php
    [3] => /ddd.php
    [4] => /zzz.php
)

AND NEXT我想将属性网址更改为&#34; /test-aaa.php-123" ;,&#34; /test-bbb.php-123"变量$ body中的等等。所以我想收到:

$body = '

<div id="one">
    <div id="two">
        <div class="sub">
            <span class="text"><a class="here" href="/test-aaa.php-123">ttt</a></span>
        </div>
        <span class="f">aa</span>
        <div class="sub2">
            <a class="here" href="/test-bbb.php-123">ttt</a>
            <div>
                <a class="here" href="/test-ttt.php-123">ttt</a>
            </div>
            <a class="here" href="/test-ddd.php-123">ttt</a>
        </div>
        <div class="sub">
            <a class="here" href="/test-zzz.php-123">ttt</a>
        </div>
    </div>
</div>

';

我可以在javascript中制作这个,但我必须使用这个PHP。有可能吗?

3 个答案:

答案 0 :(得分:2)

替换它

$new_body = preg_replace('/<a [^>]*href="(.+)"/', '$1-123', $body);

echo $new_body;

或匹配href链接

preg_match_all('/<a [^>]*href="(.+)"/', $body, $matches);

输出

var_dump($matches);

array (size=2)
  0 => 
    array (size=5)
      0 => string '<a class="here" href="/aaa.php"' (length=31)
      1 => string '<a class="here" href="/bbb.php"' (length=31)
      2 => string '<a class="here" href="/ttt.php"' (length=31)
      3 => string '<a class="here" href="/ddd.php"' (length=31)
      4 => string '<a class="here" href="/zzz.php"' (length=31)
  1 => 
    array (size=5)
      0 => string '/aaa.php' (length=8)
      1 => string '/bbb.php' (length=8)
      2 => string '/ttt.php' (length=8)
      3 => string '/ddd.php' (length=8)
      4 => string '/zzz.php' (length=8)

访问索引

foreach($matches[1] as $link)
{
   echo $link;
}

答案 1 :(得分:1)

有一个易于使用的工具,它对这些东西非常有用:http://simplehtmldom.sourceforge.net/

代码应该是这样的:

// Include the library
include('simple_html_dom.php');
foreach($body->find('a') as $a){
   $links[] = $a->href;
}

答案 2 :(得分:1)

HTML文件

<html>
<head>
<title>Example site</title>
</head>
<body>
<div id="one">
    <div id="two">
        <div class="sub">
            <span class="text"><a class="here" href="/test-aaa.php-123">ttt</a></span>
        </div>
        <span class="f">aa</span>
        <div class="sub2">
            <a class="here" href="/test-bbb.php-123">ttt</a>
            <div>
                <a class="here" href="/test-ttt.php-123">ttt</a>
            </div>
            <a class="here" href="/test-ddd.php-123">ttt</a>
        </div>
        <div class="sub">
            <a class="here" href="/test-zzz.php-123">ttt</a>
        </div>
    </div>
</div>
</body>
</html>

你必须下载并包含html dom解析器才能获得html标签。从此网址下载。 http://simplehtmldom.sourceforge.net/

这是获取文档链接的PHP脚本

<?php
include('simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('HTML FILE.html');

// Get all links
foreach($html->find('a') as $element){
       $links[] = $element->href;
}
print_r($links);
?>

输出:

Array
(
    [0] => /test-aaa.php-123
    [1] => /test-bbb.php-123
    [2] => /test-ttt.php-123
    [3] => /test-ddd.php-123
    [4] => /test-zzz.php-123
)