我正在使用PHP从网站上获取数据,并且正在尝试根据该数据创建模型。这是我当前的代码:
$dom = new DOMDocument();
$html = file_get_contents('https://www.baseball-reference.com/register/team.cgi?id=41270199');
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$table = $dom->getElementByID('team_batting');
$rows = $table->getElementsByTagName("tr");
for($i = 0; $i < $rows->length; $i++) {
$stats = $table->getElementsByTagName("td");
$name = $stats->item($i)->getAttribute('player');
$age = $stats->item($i)->getAttribute('age');
$plateAppearances = $stats->item($i)->getAttribute('PA');
$atBats = $stats->item($i)->getAttribute('AB');
$hits = $stats->item($i)->getAttribute('H');
$doubles = $stats->item($i)->getAttribute('2B');
$triples = $stats->item($i)->getAttribute('3B');
$homeruns = $stats->item($i)->getAttribute('HR');
$walks = $stats->item($i)->getAttribute('BB');
$strikeouts = $stats->item($i)->getAttribute('SO');
$name = $stats->item(0)->textContent;
$age = $stats->item(1)->textContent;
$plateAppearances = $stats->item(3)->textContent;
$atBats = $stats->item(4)->textContent;
$hits = $stats->item(6)->textContent;
$doubles = $stats->item(7)->textContent;
$triples = $stats->item(8)->textContent;
$homeruns = $stats->item(9)->textContent;
$walks = $stats->item(13)->textContent;
$strikeouts = $stats->item(14)->textContent;
$player = new Player([
'name' => $name,
'age' => $age,
'plateAppearances' => $plateAppearances,
'atBats' => $atBats,
'hits' => $hits,
'doubles' => $doubles,
'triples' => $triples,
'homeruns' => $homeruns,
'walks' => $walks,
'strikeouts' => $strikeouts
]);
echo $player;
echo '<br>';
}
这会检索我想要的所有属性,但仅会出现第一个玩家的19个实例(总行数),如下所示:
{"name":"Miguel Amaya","age":"19","plateAppearances":"241","atBats":"212","hits":"61","doubles":"14","triples":"2","homeruns":"9","walks":"24","strikeouts":"53"}
要检索表中的所有玩家而不是仅第一个玩家,并为每个玩家创建一个player
模型,我该怎么做?
编辑/更新:添加了一些我要从中提取数据的表
<tr ><th scope="row" class="right " data-stat="ranker" >1</th><td class="left " data-append-csv="player.fcgi?id=amaya-000mig" data-stat="player" csk="Amaya,Miguel" ><a href="/register/player.fcgi?id=amaya-000mig">Miguel Amaya</a></td><td class="right " data-stat="age" >19</td><td class="right " data-stat="G" >59</td><td class="right " data-stat="PA" >241</td><td class="right " data-stat="AB" >212</td><td class="right " data-stat="R" >29</td><td class="right " data-stat="H" >61</td><td class="right " data-stat="2B" >14</td><td class="right " data-stat="3B" >2</td><td class="right " data-stat="HR" >9</td><td class="right " data-stat="RBI" >33</td><td class="right " data-stat="SB" >0</td><td class="right " data-stat="CS" >0</td><td class="right " data-stat="BB" >24</td><td class="right " data-stat="SO" >53</td><td class="right " data-stat="batting_avg" >.288</td><td class="right " data-stat="onbase_perc" >.365</td><td class="right " data-stat="slugging_perc" >.500</td><td class="right " data-stat="onbase_plus_slugging" >.865</td><td class="right " data-stat="TB" >106</td><td class="right " data-stat="GIDP" >3</td><td class="right " data-stat="HBP" >3</td><td class="right " data-stat="SH" >0</td><td class="right " data-stat="SF" >2</td><td class="right " data-stat="IBB" >2</td><td class="right " data-stat="notes" ></td></tr>
答案 0 :(得分:2)
问题是$stats
不能从循环的当前行中获取<td>
元素。您将其设置为表中<td>
元素的 all 。更改
$stats = $table->getElementsByTagName("td");
收件人:
$stats = $rows[$i]->getElementsByTagName("td");
然后摆脱使用items($i)
的所有分配。 $i
是$rows
中的索引,与$stats
无关。
此外,您需要跳过表中的标题行,因为它没有任何<td>
元素。使用此方法仅获取<tbody>
中的行,而跳过<thead>
:
$rows = $table->getElementsByTagName("tbody")->item(0)->getElementsByTagName("tr");
答案 1 :(得分:1)
每次通过时,在循环内将选择表的所有TD标签。您想要的是一次只扫描一行。我建议更改foreach上的循环类型,而不是将一行作为上下文,然后在该行上下文中查找唯一的“ TD”。代码不完整,但应遵循以下步骤:
$table = $dom->getElementByID('team_batting');
$rows = $table->getElementsByTagName("tr");
foreach($rows as $row){
$cols=$row->getElementsByTagName("td");
foreach($cols as $col){
$type=$col->getAttribute('data-stat');
if($type=='player') $name=$col->textContent;
elseif(if($type=='age') $age=$col->textContent;
...
}
$player=new Player([
...
]);
}
这只是试图遵循您的样式的代码概述,但是列提取而不是在单独的变量上提取它们,可以通过将它们收集在关联数组上来更有效地完成。