我有这个HTML页面:
publishProgress
要提取文本,我使用了用PHP 7编写的代码:
AsyncTask
通过这种方式我得到了这个结果:
<div class="table_container p402_hide " id="div_Summer">
<table class=" stats_table" id="Summer">
<colgroup><col><col><col><col><col><col><col><col><col></colgroup>
<thead>
<tr class="">
<th data-stat="year" align="right" class=" sort_default_asc" >Year</th>
<th data-stat="city" align="left" class=" sort_default_asc" >City</th>
<th data-stat="country" align="left" class=" sort_default_asc" >Country</th>
<th data-stat="countries" align="right" class="" >Countries</th>
<th data-stat="participants" align="right" class="" >Participants</th>
<th data-stat="participants_men" align="right" class="" >Men</th>
<th data-stat="participants_women" align="right" class="" >Women</th>
<th data-stat="sports" align="right" class="" >Sports</th>
<th data-stat="events" align="right" class="" >Events</th>
</tr>
</thead>
<tbody>
<tr class="">
<td align="right" ><a href="/olympics/summer/2012/">2012</a></td>
<td align="left" csk="London:2012">London</td>
<td align="left" csk="Great Britain:2012">Great Britain</td>
<td align="right" >205</td>
<td align="right" >10,519</td>
<td align="right" >5,864</td>
<td align="right" >4,655</td>
<td align="right" >32</td>
<td align="right" >302</td>
</tr>
我只想要这样的文字:&#34; 2012&#34;和&#34;伦敦&#34;。我如何从$ result中提取这些信息?
答案 0 :(得分:0)
您是否尝试直接查询您感兴趣的td(s)
?
尝试使用更具体的xpath表达式,如下所示:
$result = $xpath->query('(//div[@id="div_Summer"]//tbody//tr//td[position() >= 1 and position() <= 2])');
然后通过一个简单的循环处理它们:
<?php
foreach ($result as $element) {
var_dump($element->nodeValue);
}
?>
完整示例,基于您的代码:
<?php
$html = file_get_contents('http://www.sports-reference.com/olympics/summer/');
error_reporting(E_ERROR | E_PARSE);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$result = $xpath->query('(//div[@id="div_Summer"]//tbody//tr//td[position() >= 1 and position() <= 2])');
foreach ($result as $element) {
var_dump($element->nodeValue);
}
?>
输出(截断):
string(4) "2012"
string(6) "London"
string(4) "2008"
string(7) "Beijing"
string(4) "2004"
[..]