Question

我在解析这类例子时找不到任何东西。

<h3 style="color:red; font-size:24px;">This contest is still open.</h3>

这是我的代码，但它不起作用:(我要解析这个确切的H3标签，因为页面上有很多标签，但它们没有style="color:red; font-size:24px;"所以我只想返回内容来自H3，其上有style="color:red; font-size:24px;"

$html = get_file_content('http://www.website.com/contest.php');
preg_match( '#<h3[^>]*>(.*?)</h3>#i', $html, $match );
echo $match[1];

Answer 1

为什么不使用DOMDocument？它是为解析HTML而设计的;正则表达式不是。

$dom = new DOMDocument();

// Assuming it supports URL, if not, put `file_get_contents()` in there.
$dom->loadHTMLFile('http://www.website.com/contest.php');

foreach( $dom->getElemetsByTagName('h3') as $h3) {
   if ($h3->hasAttribute('style') AND
       $h3->getAttribute('style') == 'color:red; font-size:24px;'
   ) {
      echo $h3->nodeValue;
      break;
   }
}

Answer 2

不要使用正则表达式解析HTML。使用真正的HTML解析器。像this one一样。

或many others。

Answer 3

我同意其他答案，你不应该使用正则表达式，但鉴于你是我认为这更接近你想要的。

preg_match( '#<h3[^>]+?>(.*?)</h3>#i', $html, $match );

php解析带有样式的标签

3 个答案: