我试图解决这个问题。我有这样的结构:
<tr>
<td width="10%" bgcolor="#FFFFFF"><font class="bodytext9">17-Aug-2013</font></td>
<td width="4%" bgcolor="#FFFFFF" align=center><font class="bodytext9">Sat</font></td>
<td width="4%" bgcolor="#FFFFFF" align="center"><font class="bodytext9">5 PM</font></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a class="black_9" href="teams.asp?teamno=766&leagueNo=115">XYZ Club FC</a></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9"><img src="img/colors/white.gif"></font></td>
<td width="5%" bgcolor="#FFFFFF" align="center"></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9">vs</font></td>
<td width="5%" bgcolor="#FFFFFF" align="center"></td>
<td width="5%" bgcolor="#FFFFFF" align="center"><font class="bodytext9"><img src="img/colors/orange.gif"></font></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a class="black_9" href="teams.asp?teamno=632&leagueNo=115">ABC Football Club</a></td>
<td width="15%" bgcolor="#FFFFFF" align="center"><a href="pitches.asp?id=151" class=list><u>APSM Pitch </u></a></td>
<td width="4%" bgcolor="#FFFFFF" align="center"><a target="_new" href="matchpreview_frame.asp?matchno=20877"><img src="img/matchpreview_symbol.gif" border="0"></a></td>
</tr>
这种格式会重复多次,不同的文字包含,有时,某些文字包含的类似。我只需要提取这种格式的第一组,其中包含“ABC足球俱乐部”第一次(因为它可能会出现很多次)。我该怎么做并提取每行的文字?
感谢您的评论,我在这里编辑了一些我尝试过的代码:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'url link');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$trs = $xpath->query('//tr/td[contains(.,'ABC Football Club')]');
$rows = array();
foreach($trs as $tr)
$rows[] = innerHTML($tr, true); // this function I don't include here
print_r($rows);
然而这个不行! :(
答案 0 :(得分:2)
找到包含$ needle的第一个TR
$needle = "ABC Football Club";
$doc = new DOMDocument();
$doc->loadHTML($html);
$trs = $doc->getElementsByTagName('tr');
foreach($trs as $current_tr)
{
$tr_content = $doc->saveXML($current_tr);
if(strpos($tr_content, $needle) !== FALSE)
{
break;
}
else
{
$tr_content= "";
}
}
echo $tr_content;
找到包含$ needle的第一个TR, 如果嵌套,则TR接近针头。 这可以通过重新制作过程来解决。
$needle = "ABC Football Club";
$doc = new DOMDocument();
$doc->loadHTML($html);
$node = $doc;
do
{
$trs = $node->getElementsByTagName('tr');
$node = NULL;
foreach($trs as $current_tr)
{
$tr_content = $doc->saveXML($current_tr);
if(strpos($tr_content, $needle) !== FALSE)
{
$node = $current_tr;
$found_tr = $node;
$found_tr_content = $tr_content;
break;
}
}
} while($node);
echo $found_tr_content;
答案 1 :(得分:1)
在phpquery中你会:
$dom = phpQuery::newDocument($html);
$dom->find('tr:has(> td:contains("ABC Football Club"))')->eq(0);
答案 2 :(得分:0)
获取第一个TR的TD:s,你可以使用
$doc = new DOMDocument();
$doc->loadHTML($html);
$trs = $doc->getElementsByTagName('tr');
$td_of_the_first_tr = $trs->item(0)->getElementsByTagName('td');
foreach($td_of_the_first_tr as $current_td)
{
echo $doc->saveXML($current_td) . PHP_EOL;
}