Question

下面的两种方法都有相同的用途：扫描帖子的内容，并确定至少有一个img标签是否有一个alt属性，其中包含正在测试的“关键字”。

我是xPath的新手，并且更愿意使用它，具体取决于该方法与正则表达式版本相比的成本...

方法＃1使用preg_match

function image_alt_text_has_keyword($post)
        {
            $theKeyword = trim(wpe_getKeyword($post));
            $theContent = $post->post_content;
            $myArrayVar = array();
            preg_match_all('/<img\s[^>]*alt=\"([^\"]*)\"[^>]*>/siU',$theContent,$myArrayVar);
            foreach ($myArrayVar[1] as $theValue)
            {
                if (keyword_in_content($theKeyword,$theValue)) return true;
            }
            return false;
        }

function keyword_in_content($theKeyword, $theContent)
        {
            return preg_match('/\b' . $theKeyword . '\b/i', $theContent);
        }

方法＃2使用xPath

function keyword_in_img_alt()
{
global $post;
$keyword = trim(strtolower(wpe_getKeyword($post)));
$dom = new DOMDocument;
$dom->loadHTML(strtolower($post->post_content));
$xPath = new DOMXPath($dom);
return $xPath->evaluate('count(//a[.//img[contains(@alt, "'.$keyword.'")]])');
}

Answer 1

如果要解析XML，则应使用XPath，因为它是为此目的而设计的。 XML / XHTML不是常规语言，无法通过正则表达式正确解析。你可能能够编写一个在某些时候有效的正则表达式，但会有一些特殊的情况会失败。

Answer 2

使用RegEx选择XML文档中的节点与使用它来查找给定数字是否为素数一样合适。

this is possible不会使其更合适这一事实。

此外， XPath 2.0 has RegEx support ，而RegEx没有XPath支持。因此，如果两者都需要，最好使用XPath 2.0

使用xPath还是Regex？

2 个答案: