我正在尝试使用PhpQuery解析一些html,但这对我来说并不容易......
我只需要将URL(href标签)提取到数组,但它不起作用。
请参阅此代码仅用于示例目的:
$doc = phpQuery::newDocumentHTML('<div align = "left" style="background-color:#FFFFFF;border:1px solid #C3D9FF"> </p>
<table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
<tr>
<td align="left" width="531" height="20"><small>
<strong>
<a href="/1153414/">
<font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Industrial</font><a/> </a></small></strong>
</td>
</tr>
<tr>
<td align="left" vAlign="top" width="100%" height="1">
<table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
Data:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> 4-1-2011 </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Castelo Branco</font></td>
</tr>
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Indústria / Produção </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Isotransfo, Unipessoal LDA</font></td>
</tr>
</table>
</td>
</tr>
</table>
</p>
<table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
<tr>
<td align="left" width="531" height="20"><small>
<strong>
<a href="/1153399/">
<font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Admite-se<a/> </a> </font></small></strong>
</td>
</tr>
<tr>
<td align="left" vAlign="top" width="100%" height="1">
<table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
Data:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> 4-1-2011 </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Castelo Branco</font></td>
</tr>
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Indústria / Produção </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Isotransfo, Unipessoal LDA</font></td>
</tr>
</table>
</td>
</tr>
</table>
</p>
<table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
<tr>
<td align="left" width="531" height="20"><small><font face="Arial">
<strong>
<a href="/1153280/">
<font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Precisa-se</font><a/> </a> </font></small></strong>
</td>
</tr>
<tr>
<td align="left" vAlign="top" width="100%" height="1">
<table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
Data:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> 4-1-2011 </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> ( Todas as Zonas )</font></td>
</tr>
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Saúde / Medicina / Enfermagem </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Emprego Radiologia</font></td>
</tr>
</table>
</td>
</tr>
</table>
</p>
<table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
<tr>
<td align="left" width="531" height="20"><small><font face="Arial">
<strong>
<a href="/1152665/">
<font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Operadores</font><a/> </a> </font></small></strong>
</td>
</tr>
<tr>
<td align="left" vAlign="top" width="100%" height="1">
<table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
Data:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> 4-1-2011 </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Viseu</font></td>
</tr>
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Lojas / Comércio / Balcão </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Dia Portugal Supermercados - Soc. Unip., Lda.</font></td>
</tr>
</table>
</td>
</tr>
</table>
</p>
<table cellPadding="2" cellSpacing="0" width="100%" height="60" style="border-collapse: collapse; ">
<tr>
<td align="left" width="531" height="20"><small><font face="Arial">
<strong>
<a href="/1153524/">
<font style="FONT-SIZE: 13px; LINE-HEIGHT: 14px">Responsável</font><a/> </a> </font></small></strong>
</td>
</tr>
<tr>
<td align="left" vAlign="top" width="100%" height="1">
<table align="left" border="0" cellPadding="0" cellSpacing="0" width="736">
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">
Data:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> 4-1-2011 </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Zona:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Santarem</font></td>
</tr>
<tr>
<td align="left" vAlign="top" width="67">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Categoria:</font></strong></td>
<td align="left" vAlign="top" width="150">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> Comercial / Vendas </font></td>
<td align="left" vAlign="top" width="59">
<font color="#000000" face="Arial" size="2">
<strong style="FONT-SIZE: 11px; LINE-HEIGHT: 14px; font-weight:400">
<font face="Arial" color="#333333" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px">Empresa:</font></strong></td>
<td align="left" vAlign="top" width="473">
<font face="Arial" style="FONT-SIZE: 11px; LINE-HEIGHT: 14px"> ALDI Supermercados Lda.</font></td>
</tr>
</table>
</td>
</tr>
</table>
</div>');
//echo $doc['div table a']->attr('href');
foreach ($doc['div table a'] as $a) {
$hrefs[] .= pq($a)->attr('href');
}
print_r ($hrefs);
如果我回复下面的代码,它只是我的href网址,而且没关系:
echo $doc['div table a']->attr('href');
如果我运行foreach语句,我得到一个带有一些空值的数组:
foreach ($doc['div table a'] as $a) {
$hrefs[] .= pq($a)->attr('href');
}
print_r ($hrefs);
我得到的数组是:
Array (
[0] => /1153414/
[1] =>
[2] => /1153399/
[3] =>
[4] => /1153280/
[5] =>
[6] => /1152665/
[7] =>
[8] => /1153524/
[9] =>
)
如何生成这样的数组:
Array (
[0] => /1153414/
[1] => /1153399/
[2] => /1153280/
[3] => /1152665/
[4] => /1153524/
)
如果你能给我一些线索,我将不胜感激。
抱歉我的英文不好
最诚挚的问候,
答案 0 :(得分:3)
您的代码中有五个<a/>
个实例。这会创建一个空a
元素,而不是关闭现有元素。删除它们,你的代码应该可以正常工作。
编辑从数组中删除空值的一种非常简单的方法是运行array_filter
而没有第二个参数:
$hrefs = array_filter($hrefs);
答案 1 :(得分:1)
if (pq($a)->attr('href') != '') {
$hrefs[] .= pq($a)->attr('href');
}