Question

HTML：

 <td class="td_class"><a href="javascript:goRead('115');" onmouseover="status='read';return true;" onmouseout="status=''" onfocus="blur()">Title</a></td>

我需要进行preg_match才能获得标题，我已经尝试使用此正则表达式

preg_match_all('/[^>]class=["\']td_class[\'"]*>(.*?)<\//',$result,$match);
    $datas['title'] = $match[1];
    var_dump($datas['title']);

结果是

 <a href="javascript:goRead('115');" onmouseover="status='read';return true;" onmouseout="status=''" onfocus="blur()">Title</a>

但是我只想获得标题，有人知道怎么做吗？谢谢！

Answer 1

DomDocument 非常好用，doc here。

一个简单的例子

  //This steps is useful if you want to parse html of a website
  $html = file_get_contents('www.pathtohtml.com');
  $doc = new DOMDocument();
  //if you want to load html file you can use loadHtmlFile
  $doc->loadHTML($html); //This load html string
  $aTags = $doc->getElementsByTagName('a'); 
  foreach ($aTags as $aTag) {
    //$aTag->nodeValue this contain your A tag text node!
    //You can also access attributes ..
  }

如果您需要更精确地查询Dom我向您提出的建议XPATH。

希望这会有所帮助。

pregmatch以在href内获取文本

1 个答案: