我目前能够将HTML转换为JSON。我能够用函数element_to_obj
解析html并获取带有html内容的json对象。主要关注点:是否可以仅在json对象中返回href
标记的值并忽略其他所有内容?
function html_to_obj($html) {
$dom = new DOMDocument();
$dom->loadHTML($html);
return element_to_obj($dom->documentElement);
}
function element_to_obj($element) {
$obj = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj["html"] = $subElement->wholeText;
}
else {
$obj["children"][] = element_to_obj($subElement);
}
}
return $obj;
}
$html = <<<EOF
<!DOCTYPE html>
<html lang="en">
<head>
<title> This is a test </title>
</head>
<body>
<h1> Go to a site? </h1>
<ul>
<li> <a href="http://example.com">Some Site</a> </li>
<li> <a href="http://example.com">Some Site</a> </li>
</ul>
<h1> Other sites to visit: </h1>
<div><a href="http://example.com">Some Site</a></div>
<div><a href="http://example.com">Some Site</a></div>
<div><a href="http://example.com">Some Site</a></div>
<div><a href="http://example.com">Some Site</a></div>
</body>
</html>
EOF;
header("Content-Type: text/plain");
echo json_encode(html_to_obj($html), JSON_PRETTY_PRINT);
答案 0 :(得分:0)
我认为最好的方法就是制作一个简单的文本解析器。搜索每个JSON对象,查找href =&#34;的实例,然后返回该字符串(直到下一个非转义&#34;)。如果我没记错的话,Javascript有一些基本功能,比如string.substring,可以为此工作。或者,如果您知道如何使用正则表达式,则可以使用REGEX。
答案 1 :(得分:0)
您可以使用getElementsByTagName
然后迭代所有元素。
<?php
function html_to_obj($html, $tag = 'a') {
$dom = new DOMDocument();
$dom->loadHTML($html);
return element_to_obj($dom->getElementsByTagName($tag));
}
function element_to_obj($elements) {
$obj = array();
foreach($elements as $index => $element){
$obj[$index] = array( "tag" => $element->tagName );
foreach ($element->attributes as $attribute) {
$obj[$index][$attribute->name] = $attribute->value;
}
foreach ($element->childNodes as $subElement) {
if ($subElement->nodeType == XML_TEXT_NODE) {
$obj[$index]["html"] = $subElement->wholeText;
}
else {
$obj[$index]["children"][] = element_to_obj($subElement);
}
}
}
return $obj;
}
$html = <<<EOF
<!DOCTYPE html>
<html lang="en">
<head>
<title> This is a test </title>
</head>
<body>
<h1> Go to a site? </h1>
<ul>
<li> <a href="http://example.com">Some Site</a> </li>
<li> <a href="http://example.com">Some Site</a> </li>
</ul>
<h1> Other sites to visit: </h1>
<div><a href="http://example.com">Some Site</a></div>
<div><a href="http://example.com">Some Site</a></div>
<div><a href="http://example.com">Some Site</a></div>
<div><a href="http://example.com">Some Site</a></div>
</body>
</html>
EOF;
header("Content-Type: text/plain");
echo json_encode(html_to_obj($html), JSON_PRETTY_PRINT);