我正在尝试使用Xpath替换HTML块中的<a>
标签。我搜索<a>
标签,将其拉出,修改并重新插入。
下面是我的代码:
$html_fragment = '<div class="someclass">... A bunch of HTML content ...</div>'
$esp_dom = new DOMDocument();
$esp_dom->formatOutput = false;
$esp_dom->preserveWhiteSpace = false;
$esp_dom->strictErrorChecking = false;
$esp_dom->validateOnParse = false;
$esp_dom->loadHTML($html_fragment);
$esp_xpath = new DOMXPath($esp_dom);
$esp_anchors = $esp_xpath->query('//a');
$esp_anchor_count = $esp_anchors->length;
if ($esp_anchor_count > 0){
for ($z=$esp_anchor_count; $z>0; --$z){
$esp_anchor = $esp_anchors->item($z-1);
$esp_anchor_fragment = $esp_dom->saveHTML($esp_anchor);
$esp_anchor_fragment = dom_savehtml_cleanup($esp_anchor_fragment);
$esp_anchor_new = $esp_anchor_fragment.'<!--DPM: New Comment After Tag -->';
/* !!! At this point $esp_anchor is the HTML of the original fragment !!! */
/* !!! At this point $esp_anchor_new is the correct HTML I want to replace the original node !!! */
$esp_anchor_code = $esp_dom->createDocumentFragment();
$esp_anchor_code->appendXML($esp_anchor_new);
$esp_anchor->parentNode->replaceChild($esp_anchor_code, $esp_anchor);
/* !!! NOW at this point $esp_anchor is completely empty !!! */
}
}
function dom_savehtml_cleanup($html){
$html = trim(str_replace('@nbsp;', ' ', $html));
$html = str_replace('<?xml version="1.0" encoding="UTF-8"?>', '', $html);
return $html;
}
作为一个旁注,当我从loadHTML
开始时,我能够将HTML加载到dom中。我无法使用loadXML
,因为它看到HTML中的错误并且无法正确加载。
任何想法说明replaceChild
为何使该节点为空。再次感谢。
编辑:以下是我存储在$html_fragment
中的HTML。它在实时代码中作为参数传递,但这就是该变量包含的内容。
我得到的结果是完全空的文档片段。据我所知,没有任何回报。
<div class="sp_content_item sp_content_item type_insert_headline type_standard_orientation sp_text_align_left has_impression_region" data-additional_padding="0" data-columns="1" data-column_padding="0" data-container_width="686" data-orientation_type="standard_orientation" data-orientation_type_name="" data-align="left" data-tracking_esp="omeda" data-impressionregion="TE5URVNU|TExMTENDQw==|THVjaW91cyBW|THVrZSBDYWdl">
<table class="sp_layout_headline" id="post-dc2558524822540c332fba8e6162d2b9" width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="item_separator" style="padding-bottom:25px"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="sp_text_headline_wrapper" style="padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="sp_text_headline" style="padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;background-color:transparent"><table width="100%" border="0" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td class="sp_text_title sp_text_headline_title" style="text-align:left;color:#222222;font-size:30px;font-family:Arial , Helvetica , sans-serif;font-weight:bold;line-height:1.2em;padding-bottom:3px"><a href="" target="_blank" class="sp_text_title_text" style="color:#222222;text-decoration:none;font-weight:normal">Title Of The Item</a></td>
</tr>
<tr>
<td class="sp_text_description sp_text_headline_desc" style="text-align:left;color:#555555;font-size:16px;font-family:Arial , Helvetica , sans-serif;line-height:1.2em;padding-bottom:5px;font-weight:normal">Description - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam</td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table>
</div>
答案 0 :(得分:0)
长话短说appendXML
失败了,因为HTML是无效的XML ...
结果,我设置了唯一的字符串,并在渲染后替换了这些字符串,例如:
$esp_fragment_array = []
$esp_fragment_count = 0
// Add HTML to an array to insert later. Will fail on the appendXML function otherwise
$esp_fragment_array[$esp_fragment_count] = $esp_anchor_code ;
$esp_fragment_placeholder = 'esp_tracking_placeholder_'.$esp_fragment_count;
$esp_fragment_count++;
...
// After processing, replace strings
for ($i=0; $i<$esp_fragment_count; $i++){
$haystack = str_replace("esp_tracking_placeholder_$i", $esp_fragment_array[$i], $haystack);
}