Question

我目前正在使用PHP DOM解析HTML文档中的一些标签。我想获得＆＃34;关键字＆＃34;的内容属性值的值。元标记UNCHANGED。

例如，字符串＆＃34; keyword1，keyword2 , keyword2，keyword3＆＃34;返回＆＃34; keyword1，keyword2，keyword2，keyword3＆＃34;，因此，打破输出XML文档中的实际关键字数量。

我已经尝试使用＆＃34; htmlentities（）＆＃34;，但它没有做任何事情。

Answer 1

我知道这已经很晚了，但在我重新访问我的代码进行一些编辑之后，我通过正则表达式找到了解决方案。

function GetMetaTagsContentIntact($html, $meta_name)
{
    $get_attribute_value = function($attrib, $tag)
    {
        //get attribute from html tag
        $re = '/' . preg_quote($attrib) . '=([\'"])?((?(1).+?|[^\s>]+))(?(1)\1)/is';
        if (preg_match($re, $tag, $match))
        {
          return urldecode($match[2]);
        }
        return false;
    };

    $output; // Get all meta tags.
    preg_match_all("|<meta[^>]+name=\"([^\"]*)\"[^>]" . "+content=\"([^\"]*)\"?[^>]+>|i", $html, $output, PREG_PATTERN_ORDER);
    $output = $output[0];
    // Get specified mata tag's content value.
    foreach($output as $tag)
    {
        if($meta_name == trim($get_attribute_value("name", $tag)))
        {
            return $get_attribute_value("content", $tag);
        }
    }

    return false;
}

这将获取原始HTML（最好是已解析），并使用正则表达式获取元标记本身，然后从中提取所需元标记的内容值。

但是，为了成功地附加数据，就像我一样说XML文档，你需要使用＆＃34; textContent＆＃34;特别。更多相关内容：PHP: DOMNode - Manual

PHP：获取保留编码的DOM属性

1 个答案: