如何在HTML代码块中提取单行

时间:2014-10-08 04:48:29

标签: php html regex

我的内容为:

<meta property="og:type" content="article" />
<meta property="og:url" content="http://website/fox/" />
<meta property="og:site_name" content="The Fox" />
<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209" />
<meta property="og:title" content="Fox goes to forest" />

我的要求是提取/获取一行,即meta property=og:image..,因此结果应包含:

<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209" />

2 个答案:

答案 0 :(得分:1)

^<meta property="og:image".*$

试试这个。标记mg。请参阅演示。

http://regex101.com/r/hQ1rP0/48

答案 1 :(得分:1)

提取HTML的“行”或使用正则表达式来解析HTML一般都很脆弱。更强大的是使用HTML解析器,例如DOM extension提供的支持。

实施例

$html = <<<'HTML'
<meta property="og:type" content="article" />
<meta property="og:url" content="http://website/fox/" />
<meta property="og:site_name" content="The Fox" />
<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209" />
<meta property="og:title" content="Fox goes to forest" />
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

$nodes = $xpath->query('//meta[@property="og:image"]');

foreach ($nodes as $node) {
    echo $dom->saveHTML($node);
}

输出:

<meta property="og:image" content="http://images.Fox.com/2014/09/foxandforset.gif?w=209">