这是昨天我的问题的后续行动 - Recursive UL LI to PHP multi-dimensional array - 我几乎设法将HTML块转换为数组,尽管有一些我无法解决的问题。处理下面的HTML块时,输出数组并不完全跟随输入的内容(我无法看到我出错的地方,需要一双新眼睛!)。
我已经包含以下内容:
HTML阻止
基本上采取以下形式:
-A
-B
-C
----
-D
-E
-F
----
-G
-H
-I
如下:
<li>
<ul>
<li>A</li>
<li>
<ul>
<li>B</li>
<li>
<ul>
<li>C</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>
<ul>
<li>D</li>
<li>
<ul>
<li>E</li>
<li>
<ul>
<li>F</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>
<ul>
<li>G</li>
<li>
<ul>
<li>H</li>
<li>
<ul>
<li>I</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
PHP功能和处理
function process_ul($output_data, $data, $key, $level_data, $level_key){
if(substr($data[$key], 0, 3) == '<ul'){
// going down a level in the tree
$level_key++;
// check to see if the level key exists within the level data, else create it and set to zero
if(!is_numeric($level_data[$level_key])){
$level_data[$level_key] = 0;
}
// increment the key to look at the next line
$key++;
if(substr($data[$key], 0, 4) !== '</ul'){
while(substr($data[$key], 0, 4) !== '</ul'){
// whilst we don't have an end of list, do some recursion and keep processing the array
$returnables = process_ul($output_data, $data, $key, $level_data, $level_key);
$output_data = $returnables['output'];
$data = $returnables['data'];
$key = $returnables['key'];
$level_data = $returnables['level_data'];
$level_key = $returnables['level_key'];
}
}
}
if(substr($data[$key], 0, 4) !== '</ul' && $data[$key] !== "<li>" && $data[$key] !== "</li>"){
// we don't want to be saving lines with no data or the ends of a list
// get the array key value so we know where to save it in our array (basically so we can't overwrite anything that may already exist
$this_key = &$output_data;
for($build_key=0;$build_key<($level_key+1); $build_key++){
$this_key =& $this_key[$level_data[$build_key]];
}
if(is_array($this_key)){
// look at the next key, find the next open one
$this_key[(array_pop(array_keys($this_key))+1)] = $data[$key];
} else {
// a new entry, so nothing to worry about
$this_key = $data[$key];
}
$level_data[$level_key]++;
} else if(substr($data[$key], 0, 4) == '</ul'){
// going up a level in the tree
$level_key--;
}
// increment the key to look at the next line when we loop in a moment
$key++;
// prepare the data to be returned
$return_me = array();
$return_me['output'] = $output_data;
$return_me['data'] = $data;
$return_me['key'] = $key;
$return_me['level_data'] = $level_data;
$return_me['level_key'] = $level_key;
// return the data
return $return_me;
}
// explode the data coming in by looking at the new lines
$input_array = explode("\n", $html_ul_tree_in);
// get rid of any empty lines - we don't like those
foreach($input_array as $key => $value){
if(trim($value) !== ""){
$input_data[] = trim($value);
}
}
// set the array and the starting level
$levels = array();
$levels[0] = 0;
$this_level = 0;
// loop around the data and process it
for($i=0; $i<count($input_data); $i){
$returnables = process_ul($output_data, $input_data, $i, $levels, $this_level);
$output_data = $returnables['output'];
$input_data = $returnables['data'];
$i = $returnables['key'];
$levels = $returnables['level_data'];
$this_level = $returnables['level_key'];
}
// let's see how we did
print_r($output_data);
输出
注意D位置错误,应位于[0] [2]位置 - 不是[0] [1] [2],D之后的每个其他位置都位于1位置(我确定你可以看看)。
基本上采取以下形式:
-A
-B
-C
-D
----
-E
-F
-G
----
-H
-I
如下:
Array
(
[0] => Array
(
[0] => <li>A</li>
[1] => Array
(
[0] => <li>B</li>
[1] => Array
(
[0] => <li>C</li>
)
[2] => <li>D</li>
)
[2] => Array
(
[1] => <li>E</li>
[2] => Array
(
[1] => <li>F</li>
)
[3] => <li>G</li>
)
[3] => Array
(
[2] => <li>H</li>
[3] => Array
(
[2] => <li>I</li>
)
)
)
)
感谢您的时间 - 非常感谢您正确输出阵列的任何帮助!
答案 0 :(得分:3)
IF 您的列表总是很好,您可以使用它来做您想要的。它使用SimpleXML,因此不可以容忍输入代码中的错误和错误形式。如果你想宽容,你需要使用DOM - 代码会更复杂,但不是那么荒谬。
function ul_to_array ($ul) {
if (is_string($ul)) {
if (!$ul = simplexml_load_string("<ul>$ul</ul>")) {
trigger_error("Syntax error in UL/LI structure");
return FALSE;
}
return ul_to_array($ul);
} else if (is_object($ul)) {
$output = array();
foreach ($ul->li as $li) {
$output[] = (isset($li->ul)) ? ul_to_array($li->ul) : (string) $li;
}
return $output;
} else return FALSE;
}
它采用问题中提供的确切形式的数据 - 没有外部封闭的<ul>
标记。如果要将外部<ul>
标记作为输入字符串的一部分传递,只需更改
if (!$ul = simplexml_load_string("<ul>$ul</ul>")) {
到
if (!$ul = simplexml_load_string($ul)) {
答案 1 :(得分:1)
这是一个解析HTML的工作示例,并使用DOMDocument和domNodeToArray() - 这里提供的函数将其转换为数组:http://www.ermshaus.org/2010/12/php-transform-domnode-to-array
HTML不需要格式良好。
// $inputHTML is your HTML-list as a string
// this is necessary to prevent DOMDocument errors on HTML5-elements
libxml_use_internal_errors(true);
$dom = new DOMDocument();
// UTF-8 hack, to correctly handle UTF-8 through DOMDocument
$dom->loadHTML('<?xml encoding="UTF-8">' . $inputHTML);
// get the first list-element in the HTML-document
$listAsDom = $dom->getElementsByTagName('ul')->item(0);
// print it out as array
var_dump(domNodeToArray($listAsDom));
/**
* Transforms the contents of a DOMNode to an associative array
* @author Marc Ermshaus
* http://www.ermshaus.org/2010/12/php-transform-domnode-to-array
*
* @param DOMNode $node DOMDocument node
* @return mixed Associative array or string with node content
*/
function domNodeToArray(DOMNode $node) {
$ret = '';
if ($node->hasChildNodes()) {
if ($node->firstChild === $node->lastChild
&& $node->firstChild->nodeType === XML_TEXT_NODE
) {
// Node contains nothing but a text node, return its value
$ret = trim($node->nodeValue);
} else {
// Otherwise, do recursion
$ret = array();
foreach ($node->childNodes as $child) {
if ($child->nodeType !== XML_TEXT_NODE) {
// If there's more than one node with this node name on the
// current level, create an array
if (isset($ret[$child->nodeName])) {
if (!is_array($ret[$child->nodeName])
|| !isset($ret[$child->nodeName][0])
) {
$tmp = $ret[$child->nodeName];
$ret[$child->nodeName] = array();
$ret[$child->nodeName][] = $tmp;
}
$ret[$child->nodeName][] = domNodeToArray($child);
} else {
$ret[$child->nodeName] = domNodeToArray($child);
}
}
}
}
}
return $ret;
}