所以,我有下面的PHP刮刀代码和HTML,我想用Xpath抓取。
当我尝试抓取每个@href
时,它会显示outerHTML <a href="\"javascript:cal_action(14,">14</a>
,当它被假设为<a href="\"javascript:cal_action(14, 2, 2014)\"">14</a>
时
空间所在的@href
被切成两半。是什么原因造成的?
$content = $xpath->query('//a');
foreach($content as $c){
var_dump(htmlspecialchars($c->C14N())); echo '<br>';
}
上面的一个是CURL代码。 这是HTML。
<div class="outercalendar" id="maincalendar821"><table class="calendarHeader">
<tbody><tr>
<td><input type="button" onclick="AjxGetMainCalendarMonth('2', '2015', '821')" value="<"></td>
<td class="calendarHeader" colspan="5">March 2015</td>
<td><input type="button" onclick="AjxGetMainCalendarMonth('4', '2015', '821')" value=">"></td>
</tr>
</tbody></table>
<table class="calendar">
<tbody><tr>
<td class="calendarDay">S</td>
<td class="calendarDay">M</td>
<td class="calendarDay">T</td>
<td class="calendarDay">W</td>
<td class="calendarDay">T</td>
<td class="calendarDay">F</td>
<td class="calendarDay">S</td>
</tr>
<tr>
<td class="calendar"><a href="javascript:cal_action(1, 3, 2015)">1</a></td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"><a href="javascript:cal_action(7, 3, 2015)">7</a></td>
</tr>
<tr>
<td class="calendar"><a href="javascript:cal_action(8, 3, 2015)">8</a></td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"><a href="javascript:cal_action(14, 3, 2015)">14</a></td>
</tr>
<tr>
<td class="calendar"><a href="javascript:cal_action(15, 3, 2015)">15</a></td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"><a href="javascript:cal_action(21, 3, 2015)">21</a></td>
</tr>
<tr>
<td class="calendar"><a href="javascript:cal_action(22, 3, 2015)">22</a></td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"><a href="javascript:cal_action(28, 3, 2015)">28</a></td>
</tr>
<tr>
<td class="calendar"><a href="javascript:cal_action(29, 3, 2015)">29</a></td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
</tr>
</tbody></table>
</div>
答案 0 :(得分:0)
问题可能出在标签中存储的信息结构中。
我建议从更详细的xpath开始:
//a/@href
所以你的初始代码是:
$content = $xpath->query('//a/@href');