将包含XML标记的文本文件解析为数组

时间:2014-10-21 11:23:01

标签: php arrays regex parsing

我需要解析一个包含html标签的文本文件,如下所示:

<item>
 <value4="L5u9eDNV40_val4">
 <value6="xcE90l2HyN_val6">
 <value3="hJyVXoE4YQ_val3">
 <value5="K68yGpDsTR_val5">
 <value2="umrVvR8Tfe_val2">
 <value1="y6Ms2E5BHe_val1">
</item>

<item>
 <value4="T4PFOipm3u_val4">
 <value2="upLkW2r8nq_val2">
 <value3="3h7lV6CaHP_val3">
 <value5="4pETv3bt5c_val5">
 <value1="iEPZCnzxjs_val1">
 <value6="fWjg1Ueo5M_val6">
</item>

我需要使用PHP,结果应该是这样的数组:

array (size=10000) 
0 => array (size = 3) 
'value1' => string 'L5u9eDNV40_val4',
'value2' => string 'umrVvR8Tfe_val2',
'value4' => string 'T4PFOipm3u_val4'    `

我使用SimpleHTMLDOM尝试了这个,但我无法做任何有用的事情。

2 个答案:

答案 0 :(得分:1)

<(value\d+)="([^"]*)"

试试这个。抓住捕获。参见演示。

http://regex101.com/r/lD8uH4/3

答案 1 :(得分:0)

目前尚不清楚您想要的最终数据结构,但此代码将创建一个数组数组$v_arr,其中每个子数组包含一个<item>的值:

$v_arr = array();
# split the string up into an array with one <item> per array element
$items = explode("<item>", $text);
foreach ($items as $i) {
    # only parse entries that have <value... tags
    if (strpos($i, '<value') !== false) {
        # parse the value tags, save the matches in $matches
        if (preg_match_all('#<(value\d)="(.+?)">#', $i, $matches)) {
            # create a new array with valueX as keys, the other string as values.
            # push this array on to a results array
            $v_arr[] = array_combine( $matches[1], $matches[2] );
        }
    }
}
print_r($v_arr);

您发布的文字的输出:

Array
(
    [0] => Array
        (
            [value4] => L5u9eDNV40_val4
            [value6] => xcE90l2HyN_val6
            [value3] => hJyVXoE4YQ_val3
            [value5] => K68yGpDsTR_val5
            [value2] => umrVvR8Tfe_val2
            [value1] => y6Ms2E5BHe_val1
        )

    [1] => Array
        (
            [value4] => T4PFOipm3u_val4
            [value2] => upLkW2r8nq_val2
            [value3] => 3h7lV6CaHP_val3
            [value5] => 4pETv3bt5c_val5
            [value1] => iEPZCnzxjs_val1
            [value6] => fWjg1Ueo5M_val6
        )

)