下面是字符串的不同实例的列表。我正在寻找一个正则表达式,它将返回字符串中第一个html标记的名称。
例外:
如果锚标记<a>
是字符串中的第一个标记,那么它应该返回空字符串''。
此外,如果字符串没有任何html标记,那么它应该返回空字符串''。
$string = '<h6>Test content</h6>';
// Expected output = h6
$string = '<h6 class="my-class">Test content</h6>';
// Expected output = h6
$string = '<div>Test content</div>';
// Expected output = div
$string = '<div id="my-id" class="my-class">Test content</div>';
// Expected output = div
$string = '<div><a href="test.html">Test content</a></div>';
// Expected output = div
$string = '<div class="my-class"><a href="test.html">Test content</a></div>';
// Expected ouput = div
$string = '<a href="test.html">Test content</a>';
// Expected output = empty string
// It should return empty string if the first html tag is <a>
$string = "Test content";
// Expected output = empty string
// It should return empty string if there is not html tags wrapper.
请帮助!!!
答案 0 :(得分:1)
将默认$element
设置为空字符串,该字符串将用于传入字符串中没有HTML以及第一个元素为a
时的情况。首先检查字符串是否包含任何HTML标记。将传入的字符串与strip_tags($string)
的值进行比较。如果它们相同,则没有HTML标记跳到底部并返回$element
这是一个空白字符串。
如果有HTML标记,请将其加载到DOMDocument中并使用XPath获取第一个节点名称。如果是a
如果不是a
,请使用节点名称设置$element
。打破循环。
XPath包含/html/body/*
,因为当您将loadHTML()
与无效或部分HTML一起使用时,它会添加<html>
和<body>
标记。对于不包含任何HTML的字符串,它还会添加<p>
标记。
function getFirstElement($string) {
$element = '';
// check for any HTML tags
if($string !== strip_tags($string)) {
$doc = new DOMDocument();
$doc->loadHTML($string);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('/html/body/*') as $node) {
// check for a tag
if((string)$node->nodeName != 'a') {
// check for string passed in with tag in middle of string, loadHTML adds p tag so skip it
if(substr($string, 0, 1) != '<' && (string)$node->nodeName == 'p') continue;
$element = (string)$node->nodeName;
break;
} else {
break;
}
}
}
return $element;
}
$string = '<h6>Test content</h6>';
getFirstElement($string);
returns 'h6'
$string = '<h6 class="my-class">Test content</h6>';
getFirstElement($string);
returns 'h6'
$string = '<div>Test content</div>';
getFirstElement($string);
returns 'div'
$string = '<div id="my-id" class="my-class">Test content</div>';
getFirstElement($string);
returns 'div'
$string = '<div><a href="test.html">Test content</a></div>';
getFirstElement($string);
returns 'div'
$string = '<div class="my-class"><a href="test.html">Test content</a></div>';
getFirstElement($string);
returns 'div'
$string = '<a href="test.html">Test content</a>';
getFirstElement($string);
returns ''
$string = "Test content";
getFirstElement($string);
returns ''
$string = "Test <div>content</div>";
getFirstElement($string);
returns 'div'
$string = "<p>Test content</p>";
getFirstElement($string);
returns 'p'
所以当你使用DOMDocument时,你可以看到它的样子:: loadHTML()这里是DOMDocument :: saveHTML()的输出。
$string = '<h6>Test content</h6>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><h6>Test content</h6></body></html>
$string = '<h6 class="my-class">Test content</h6>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><h6 class="my-class">Test content</h6></body></html>
$string = '<div>Test content</div>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>Test content</div></body></html>
$string = '<div id="my-id" class="my-class">Test content</div>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div id="my-id" class="my-class">Test content</div></body></html>
$string = '<div><a href="test.html">Test content</a></div>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div><a href="test.html">Test content</a></div></body></html>
$string = '<div class="my-class"><a href="test.html">Test content</a></div>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div class="my-class"><a href="test.html">Test content</a></div></body></html>
$string = '<a href="test.html">Test content</a>';
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><a href="test.html">Test content</a></body></html>
$string = "Test content";
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Test content</p></body></html>
$string = "Test <div>content</div>";
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Test </p><div>content</div></body></html>
$string = "<p>Test content</p>";
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>Test content</p></body></html>