xpath错误的html属性

时间:2015-02-12 23:29:15

标签: php curl xpath

所以,我有下面的PHP刮刀代码和HTML,我想用Xpath抓取。 当我尝试抓取每个@href时,它会显示outerHTML <a href="\&quot;javascript:cal_action(14,">14</a>,当它被假设为<a href="\&quot;javascript:cal_action(14, 2, 2014)\&quot;">14</a>时 空间所在的@href被切成两半。是什么原因造成的?

$content = $xpath->query('//a');

    foreach($content as $c){
        var_dump(htmlspecialchars($c->C14N())); echo '<br>';
    }

上面的一个是CURL代码。 这是HTML。

    <div class="outercalendar" id="maincalendar821"><table class="calendarHeader">
    <tbody><tr>
    <td><input type="button" onclick="AjxGetMainCalendarMonth('2', '2015', '821')" value="<"></td>
    <td class="calendarHeader" colspan="5">March 2015</td>
    <td><input type="button" onclick="AjxGetMainCalendarMonth('4', '2015', '821')" value=">"></td>
    </tr>
    </tbody></table>
    <table class="calendar">
    <tbody><tr>
    <td class="calendarDay">S</td>
    <td class="calendarDay">M</td>
    <td class="calendarDay">T</td>
    <td class="calendarDay">W</td>
    <td class="calendarDay">T</td>
    <td class="calendarDay">F</td>
    <td class="calendarDay">S</td>
    </tr>
    <tr>
    <td class="calendar"><a href="javascript:cal_action(1, 3, 2015)">1</a></td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar"><a href="javascript:cal_action(7, 3, 2015)">7</a></td>
    </tr>
    <tr>
    <td class="calendar"><a href="javascript:cal_action(8, 3, 2015)">8</a></td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar"><a href="javascript:cal_action(14, 3, 2015)">14</a></td>
    </tr>
    <tr>
    <td class="calendar"><a href="javascript:cal_action(15, 3, 2015)">15</a></td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar"><a href="javascript:cal_action(21, 3, 2015)">21</a></td>
    </tr>
    <tr>
    <td class="calendar"><a href="javascript:cal_action(22, 3, 2015)">22</a></td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar"><a href="javascript:cal_action(28, 3, 2015)">28</a></td>
    </tr>
    <tr>
    <td class="calendar"><a href="javascript:cal_action(29, 3, 2015)">29</a></td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    <td class="calendar">&nbsp;</td>
    </tr>
    </tbody></table>
    </div>

1 个答案:

答案 0 :(得分:0)

问题可能出在标签中存储的信息结构中。

我建议从更详细的xpath开始:

//a/@href

所以你的初始代码是:

$content = $xpath->query('//a/@href');