使用regex在PHP中提取HTML属性

时间:2014-03-22 14:16:31

标签: php html regex

我想从PHP获取HTML字符串的HTML属性,但是失败了:

$string = '<ul id="value" name="Bob" custom-tag="customData">';
preg_filter("/(\w[-\w]*)=\"(.*?)\"/", '$1', $string ); // returns "<ul id name custom-tag"
preg_filter("/(\w[-\w]*)=\"(.*?)\"/", '$1', $string ); // returns "<ul value Bob customData"

我想要回归的是:

array(
  'id' => 'value',
  'name' => 'Bob',
  'custom-tag' => 'customData'
);

2 个答案:

答案 0 :(得分:4)

Don't use regexes for parsing HTML

$string = '<ul id="value" name="Bob" custom-tag="customData">';
$dom = new DOMDocument();
@$dom->loadHTML($string);
$ul = $dom->getElementsByTagName('ul')->item(0);
echo $ul->getAttribute("id");
echo $ul->getAttribute("name");
echo $ul->getAttribute("custom-tag");

答案 1 :(得分:4)

HTML不是常规语言,无法使用正则表达式正确解析。请改用DOM解析器。这是使用PHP的内置DOMDocument类的解决方案:

$string = '<ul id="value" name="Bob" custom-tag="customData">';

$dom = new DOMDocument();
$dom->loadHTML($string);

$result = array();

$ul = $dom->getElementsByTagName('ul')->item(0);
if ($ul->hasAttributes()) {
    foreach ($ul->attributes as $attr) {
        $name = $attr->nodeName;
        $value = $attr->nodeValue;    
        $result[$name] = $value;
    }
}

print_r($result);

输出:

Array
(
    [id] => value
    [name] => Bob
    [custom-tag] => customData
)