Question

我关注如何检索PHP中的所有标记this question。

具体来说（在wordpress下），我想找到所有 <pre> 标签，其中包含所有可用信息（属性和文字）。但是，似乎我preg_match并不熟练，所以我转向你。

我的文字包含各种<pre>标签，其中一些带有属性，有些只带有文字。我的功能是：

function getPreTags($string) {
    $pattern = "/<pre\s?(.*)>(.*)<\/pre>/";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}

我已经使用一个 <pre>标记进行了测试，但我得到count(getPreTags(myHTMLbody)) = 0，我不知道为什么。这是测试字符串：

<pre class="wp-code-highlight prettyprint prettyprinted" style=""><span class="com">Whatever &lt;</span> I've written &gt;&gt; here <span class="something">should be taken care of</span></pre>

任何提示？

干杯！

Answer 1

与以往一样，使用正则表达式解析HTML永远不会削减它。有很多事情要考虑（标签汤，间距：<pre> == < pre > == <\n\t\sPrE\n\n> ...），任何正则表达式都会在某些时候失败。这就是为什么有解析器这样的东西，随时可用。

说：我不知道为什么其他答案会遇到使用所有 DOMXPath标记的pre实例的问题，包括那些没有属性。
我会选择更简单的东西，比如：

$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$preTags = $dom->getElementsByTagName('pre');
foreach($preTags as $pre)
{
    echo $pre->nodeValue, PHP_EOL;
    if ($pre->hasAttributes())
    {//if there are attributes
        foreach($pre->attributes as $attribute)
        {
            //do something with attribute
            echo 'Attribute: ', $attribute->name, ' = ', $attribute->value, PHP_EOL;
        }
    }
}

您可以在以下页面轻松找到可用的方法和属性：

Answer 2

您最好使用DOM解析器来解析HTML。请考虑以下代码：

$html = <<< EOF
<a href="http://example.com/foo.htm" class="curPage">Click link1</a> morestuff
<pre>A    B    C</pre>
<a href="http://notexample.com/foo/bar">notexample.com</a> morestuff
<pre id="pre1">X    Y    Z</pre>
<a href="http://example.com/foo.htm">Click link1</a>
<pre id="pre2">1    2    3</pre>
EOF;

// create a new DOM object
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);

// select all pre tags with attributes
$nodelist = $xpath->query("//pre[@*]");

// iterate through selected nodes and print them
for($i=0; $i < $nodelist->length; $i++) {
    $node = $nodelist->item($i);
    var_dump($node->nodeValue);
}

<强>输出：

string(11) "X    Y    Z"
string(11) "1    2    3"

Answer 3

如果数据符合XML，则可以使用XPATH表达式。

只是一个非常快的：

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <pre>1</pre>
    <pre>2</pre>
    <pre>3</pre>
  </body>
</html>

然后像这样的PHP：

<?php
        $xmldoc = new DOMDocument();
        $xmldoc->load('test.xml');

        $xpathvar = new Domxpath($xmldoc);

echo $xpathvar->evaluate('count(*//pre)');
?>

这也适用于html / xml片段。

查找全部<pre> tags in PHP (with attributes)</pre>

3 个答案: