无法从php xpath解析表html字符串中提取td值

时间:2018-01-16 23:13:07

标签: php html xpath

我有来自维基百科页面的以下html字符串片段...

do(x => isNaN(x.index) ? x.index = 0 : x.index++)

我有以下php代码....

<table class="wikitable">
<tbody>
 <tr>
     <td>mod_access</td>
     <td>Versions older than 2.1</td>
     <td>Included by Default</td>
 </tr>
 <tr>
     <td>mod_actions</td>
     <td>Versions 1.1 and later</td>
     <td>Included by Default</td>
 </tr>
 <tr>
    <td>mod_alias</td>
    <td>Versions 1.1 and later</td>
    <td>Included by Default</td>
 </tr>
</tr>
</tbody>

我想要的是一个数字数组,每个索引为ini_set('display_errors','On'); $url="https://en.wikipedia.org/wiki/List_of_Apache_modules"; $dom=new DomDocument(); $dom->preserveWhiteSpace=false; $dom->loadHtmlFile($url); $xpath=new DomXpath($dom); $elements=$xpath->query('//*[@id="mw-content-text"]/div/table/tbody/tr/td'); foreach($elements as $i=>$row){ $tds=$xpath->query('td',$row); foreach($tds as $td){ echo "Td($i):", $td->nodeValue,"\n"; } }

不太确定下一步该做什么。

1 个答案:

答案 0 :(得分:1)

如果您从第一个xpath查询中删除tbodytd,它将找到所有tr个元素:

$elements = $xpath->query('//*[@id="mw-content-text"]/div/table/tr');

然后,您可以遍历每一行,使用现有代码查找td元素,并将它们添加到数组中:

$data = array();
foreach ($elements as $y => $row) {
    $tds = $xpath->query('td', $row);
    foreach($tds as $x => $td) {
        $data[$y][$x] = $td->nodeValue;
    }
}
var_dump($data);

使用php 5.6测试,给出了这个输出:

array(157) {
  [1]=>
  array(6) {
    [0]=>
    string(10) "mod_access"
    [1]=>
    string(23) "Versions older than 2.1"
    [2]=>
    string(19) "Included by Default"
    [3]=>
    string(26) "Apache Software Foundation"
    [4]=>
    string(27) "Apache License, Version 2.0"
    [5]=>
    string(71) "Provides access control based on the client and the client's request[2]"
  }
  [2]=>
  array(6) {
    [0]=>
    string(11) "mod_actions"
    [1]=>
    string(22) "Versions 1.1 and later"
    [2]=>
    string(19) "Included by Default"
    [3]=>
    string(26) "Apache Software Foundation"
    [4]=>
    string(27) "Apache License, Version 2.0"
    [5]=>
    string(62) "Provides CGI ability based on request method and media type[3]"
  }
// etc ...