在PHP中使用xPath:loadXml无法正确插入文本

时间:2018-11-01 15:09:15

标签: php xpath

我正在尝试使用Xpath替换HTML块中的<a>标签。我搜索<a>标签,将其拉出,修改并重新插入。

下面是我的代码:

    $html_fragment = '<div class="someclass">... A bunch of HTML content ...</div>'
    $esp_dom = new DOMDocument();
    $esp_dom->formatOutput = false;
    $esp_dom->preserveWhiteSpace = false;
    $esp_dom->strictErrorChecking = false;
    $esp_dom->validateOnParse = false;
    $esp_dom->loadHTML($html_fragment);

    $esp_xpath = new DOMXPath($esp_dom);
    $esp_anchors = $esp_xpath->query('//a');
    $esp_anchor_count = $esp_anchors->length;

    if ($esp_anchor_count > 0){
        for ($z=$esp_anchor_count; $z>0; --$z){
            $esp_anchor = $esp_anchors->item($z-1);

            $esp_anchor_fragment = $esp_dom->saveHTML($esp_anchor);
            $esp_anchor_fragment = dom_savehtml_cleanup($esp_anchor_fragment);


            $esp_anchor_new = $esp_anchor_fragment.'<!--DPM: New Comment After Tag -->';

        /* !!! At this point $esp_anchor is the HTML of the original fragment !!! */
        /* !!! At this point $esp_anchor_new is the correct HTML I want to replace the original node !!! */
            $esp_anchor_code = $esp_dom->createDocumentFragment();
            $esp_anchor_code->appendXML($esp_anchor_new);
            $esp_anchor->parentNode->replaceChild($esp_anchor_code, $esp_anchor);
        /* !!! NOW at this point $esp_anchor is completely empty !!! */

        }
    }


    function dom_savehtml_cleanup($html){
      $html = trim(str_replace('@nbsp;', '&nbsp;', $html));
      $html = str_replace('<?xml version="1.0" encoding="UTF-8"?>', '', $html);
      return $html;
    }

作为一个旁注,当我从loadHTML开始时,我能够将HTML加载到dom中。我无法使用loadXML,因为它看到HTML中的错误并且无法正确加载。

任何想法说明replaceChild为何使该节点为空。再次感谢。


编辑:以下是我存储在$html_fragment中的HTML。它在实时代码中作为参数传递,但这就是该变量包含的内容。

我得到的结果是完全空的文档片段。据我所知,没有任何回报。

  <div class="sp_content_item sp_content_item type_insert_headline type_standard_orientation sp_text_align_left has_impression_region" data-additional_padding="0" data-columns="1" data-column_padding="0" data-container_width="686" data-orientation_type="standard_orientation" data-orientation_type_name="" data-align="left" data-tracking_esp="omeda" data-impressionregion="TE5URVNU|TExMTENDQw==|THVjaW91cyBW|THVrZSBDYWdl">
    <table class="sp_layout_headline" id="post-dc2558524822540c332fba8e6162d2b9" width="100%" border="0" cellspacing="0" cellpadding="0">
      <tbody>
        <tr>
          <td class="item_separator" style="padding-bottom:25px"><table width="100%" border="0" cellspacing="0" cellpadding="0">
              <tbody>
                <tr>
                  <td class="sp_text_headline_wrapper" style="padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px"><table width="100%" border="0" cellspacing="0" cellpadding="0">
                      <tbody>
                        <tr>
                          <td class="sp_text_headline" style="padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;background-color:transparent"><table width="100%" border="0" cellspacing="0" cellpadding="0">
                              <tbody>
                                <tr>
                                  <td class="sp_text_title sp_text_headline_title" style="text-align:left;color:#222222;font-size:30px;font-family:Arial , Helvetica , sans-serif;font-weight:bold;line-height:1.2em;padding-bottom:3px"><a href="" target="_blank" class="sp_text_title_text" style="color:#222222;text-decoration:none;font-weight:normal">Title Of The Item</a></td>
                                </tr>
                                <tr>
                                  <td class="sp_text_description sp_text_headline_desc" style="text-align:left;color:#555555;font-size:16px;font-family:Arial , Helvetica , sans-serif;line-height:1.2em;padding-bottom:5px;font-weight:normal">Description - Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam</td>
                                </tr>
                              </tbody>
                            </table></td>
                        </tr>
                      </tbody>
                    </table></td>
                </tr>
              </tbody>
            </table></td>
        </tr>
      </tbody>
    </table>
  </div>

1 个答案:

答案 0 :(得分:0)

长话短说appendXML失败了,因为HTML是无效的XML ...

结果,我设置了唯一的字符串,并在渲染后替换了这些字符串,例如:

    $esp_fragment_array = []
    $esp_fragment_count = 0

    // Add HTML to an array to insert later. Will fail on the appendXML function otherwise
    $esp_fragment_array[$esp_fragment_count] = $esp_anchor_code ;
    $esp_fragment_placeholder = 'esp_tracking_placeholder_'.$esp_fragment_count;
    $esp_fragment_count++;

    ...

    // After processing, replace strings
    for ($i=0; $i<$esp_fragment_count; $i++){
      $haystack = str_replace("esp_tracking_placeholder_$i", $esp_fragment_array[$i], $haystack);
    }