使用PHP读取所有标签

时间:2014-09-06 04:50:13

标签: php html

我有一些API的html输出,我想从输出中读取所有标签。

输入PHP脚本:

<table bgcolor="white" border="1" cellpadding="0" cellspacing="0" height="290" width="450" bordercolor="dodgerblue" align="center" class="txt">
   <tbody>
      <tr>
         <td>
            <table border="0" cellpadding="0" cellspacing="0" height="288" width="448" bgcolor="#ffffff" bordercolor="darkgray" class="txt">
               <tbody>
                  <tr>
                     <td align="middle"><img height="18" src="/assets/images/dn1.gif" width="28"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"><img height="18" src="/assets/images/up1.gif" width="28"></td>
                     <td align="middle"><img height="18" src="/assets/images/dn1.gif" width="28"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"><img height="18" src="/assets/images/up1.gif" width="28"></td>
                  </tr>
                  <tr>
                     <td align="middle"></td>
                     <td align="middle"><img height="18" src="/assets/images/dn1.gif" width="28"></td>
                     <td align="middle"></td>
                     <td align="middle"><strong><img src="/assets/images/5.gif" width="28" height="18"></strong></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"><img height="18" src="/assets/images/up1.gif" width="28"></td>
                     <td align="middle"><strong><img src="/assets/images/4.gif" width="28" height="18"></strong></td>
                     <td align="middle"></td>
                     <td align="middle"><img height="18" src="/assets/images/dn1.gif" width="28"></td>
                     <td align="middle"></td>
                     <td align="middle"></td>
                     <td align="middle"><strong><img src="/assets/images/3.gif" width="28" height="18"></strong></td>
                     <td align="middle"></td>
                     <td align="middle"><img height="18" src="/assets/images/up1.gif" width="28"></td>
                     <td align="middle"></td>
                  </tr>
               </tbody>
            </table>
         </td>
      </tr>
   </tbody>
</table>

我希望脚本的输出采用数组的形式,如下所述:

array(
[0] => First td content
[1] => Second td content

.
.
. so on...

)

我试过这个http://www.phpclasses.org/package/3022-PHP-Parse-HTML-tables-and-extract-data-into-arrays.html,但它没有用......

1 个答案:

答案 0 :(得分:2)

目标是在@src内抓取每个<img>的{​​{1}}属性值,同时保留正确的td索引,这样的事情就应该这样做。

实施例

<td>

输出:

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

$tds = $xpath->query('//td[not(descendant::td)]');
$output = [];

foreach ($tds as $td) {
    $data = null;
    $sources = $xpath->query('.//img/@src', $td);
    foreach ($sources as $src) {
        $data = $src->value;
    }

    $output[] = $data;
}

var_export($output);