解析html - CURL和正则表达式

时间:2012-06-15 12:12:02

标签: php regex curl

如何获取文本:“Text example max”from:

<td valign="top" align="left">

    <a href="/server?tree=xabaf"
    class="normal"> Text example max </a>

</td>

使用正则表达式?

include('simple_html_dom.php');
$ch = curl_init('http://www.site.com?id=325235');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$sss = curl_exec($ch);
curl_close($ch);

preg_match_all("#class="normal"?</a>$#", $sss, $arr);

3 个答案:

答案 0 :(得分:1)

使用REGEX的解决方案


$text = "<a href='/server?tree=xabaf' class='normal'> Text example max </a>
";
$regex_pattern = "/<a href=\"?\'?(.*)\"?\'?>(.*)<\/a>/";
preg_match_all($regex_pattern,$text,$matches);

PHP的DOM

$text = "<a href='/server?tree=xabaf' class='normal'> Text example max </a>";
$dom = new DOMDocument;
$dom->loadHTML($text);
$links = $dom->getElementsByTagName('a');
foreach ($links as $link){
    echo $link->textContent;
}

使用DOM而不是正则表达式。

答案 1 :(得分:0)

由于没有其他文字,因此应用strip_tags()就足够了。

$str ='<td valign="top" align="left">

    <a href="/server?tree=xabaf"
    class="normal"> Text example max </a>

</td>';

$str = trim(strip_tags($str));

答案 2 :(得分:0)

你可以试试这个......

include('simple_html_dom.php');

$url = 'http://www.site.com?id=325235';

$curl = curl_init(); 
curl_setopt($curl, CURLOPT_URL, $url);  
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);  
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);  
$str = curl_exec($curl);  
curl_close($curl);

$html = str_get_html($str);

$content = $html->find('div[class=normal]');
echo $content->innertext;