PHP html字符串到DOMDocument没有返回每个元素的数组

时间:2014-10-22 22:18:10

标签: php html arrays domdocument

我试图创建一个将HTMl字符串转换为多维数组的函数,其中父数组是标记,子元素是属性,但如果我print_r()我的函数它没有&#39 ; t返回每个元素。

字符串原本是一个大对象的一部分,看起来像这样:

Array
(
  [0] => stdClass Object
   (          
    [html] => 
      <input type="radio" name="radio1" value="18" checked="checked" id="80">
      <label class="other" for="80">Label for radio 1</label>
      <input type="radio" name="radio2" value="20" id="81">
      <label class="other" for="81">Label for radio 2</label>
   )

  [1] => stdClass Object
   (
    [html] => 
      <input type="radio" name="radio3" value="19" checked="checked" id="91">
      <label class="other" for="91">Label for radio 3</label>
      <input type="radio" name="radio4" value="21" id="92">
      <label class="other" for="92">Label for radio 4</label>
   )

)

这是我的功能:

<?php
function htmltoarray($param){
    $doc = new DOMDocument();
        $doc->loadHTML($param);
        $doc->preserveWhiteSpace = false;
        $html = $doc->getElementsByTagName('*');        
        $form = array();
        foreach($html as $v){           
            $tag = $v->nodeName;
            $val = $v->nodeValue;           
            foreach($v->attributes as $k => $a){
                $form[$tag]['txt'] = utf8_decode($val);
                $form[$tag][$k] = $a->nodeValue;
            }
        }   
    return $form;
}

// AND I CALL THE FUNCTION HERE:
foreach($myobject as $formelement){
  $convertthis = $formelement->html;
  echo '<pre>'; print_r(htmltoarray($convertthis)); echo '</pre>';
}
?>

然后返回:

<pre>Array
(
 [input] => Array
   (
     [txt] => 
     [id] => 80
     [checked] => checked
     [type] => radio
     [value] => 20
     [name] => radio1
   )

 [label] => Array
   (
     [txt] => Label for radio 1
     [for] => 80
     [class] => other
  )

)
</pre>

<pre>Array
(
 [input] => Array
   (
     [txt] => 
     [id] => 92
     [checked] => checked
     [type] => radio
     [value] => 21
     [name] => radio4
   )

 [label] => Array
   (
     [txt] => Label for radio 4
     [for] => 92
     [class] => other
   )

)
</pre>

如您所见,它返回第一个字符串中的前两个元素,第二个字符串中的最后两个元素。

我错过了什么?为什么这个奇怪的重新调整,如何修复它以返回每个元素?

1 个答案:

答案 0 :(得分:0)

数组中的值被覆盖,因此只获取最后一个值。首先创建一个临时分组数组。然后将它们合并并推入内部。

我改进了一点:

function htmltoarray($param){
    $doc = new DOMDocument();
    $doc->loadHTML($param);
    $doc->preserveWhiteSpace = false;
    // get body children
    $html = $doc->getElementsByTagName('body')->item(0)->childNodes;
    $form = array();
    foreach($html as $v){
        if(get_class($v) != 'DOMText') { // disregard Text nodes
            $tag = $v->nodeName;
            $val = $v->nodeValue;
            $attrs = array();
            foreach($v->attributes as $k => $a){ // gather all attributes
                $attrs[$k] = $a->nodeValue;
            }
            // merge dom text value with attributes
            $element = array_merge(array('txt' => utf8_decode($val)), $attrs);
            $form[$tag][] = $element; // push them inside with another dimension
                    // ^ this one
        }
    }

    return $form;
}

// AND I CALL THE FUNCTION HERE:
foreach($myobject as $formelement){
  $convertthis = $formelement->html;
  echo '<pre>'; print_r(htmltoarray($convertthis)); echo '</pre>';
}

Sample Output