Question

我正在使用PHP从网站上获取数据，并且正在尝试根据该数据创建模型。这是我当前的代码：

$dom = new DOMDocument();
$html = file_get_contents('https://www.baseball-reference.com/register/team.cgi?id=41270199');
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$table = $dom->getElementByID('team_batting');
$rows = $table->getElementsByTagName("tr");

for($i = 0; $i < $rows->length; $i++) {

    $stats = $table->getElementsByTagName("td");

    $name = $stats->item($i)->getAttribute('player');
    $age = $stats->item($i)->getAttribute('age');
    $plateAppearances = $stats->item($i)->getAttribute('PA');
    $atBats = $stats->item($i)->getAttribute('AB');
    $hits = $stats->item($i)->getAttribute('H');
    $doubles = $stats->item($i)->getAttribute('2B');
    $triples = $stats->item($i)->getAttribute('3B');
    $homeruns = $stats->item($i)->getAttribute('HR');
    $walks = $stats->item($i)->getAttribute('BB');
    $strikeouts = $stats->item($i)->getAttribute('SO');

    $name = $stats->item(0)->textContent;
    $age = $stats->item(1)->textContent;
    $plateAppearances = $stats->item(3)->textContent;
    $atBats = $stats->item(4)->textContent;
    $hits = $stats->item(6)->textContent;
    $doubles = $stats->item(7)->textContent;
    $triples = $stats->item(8)->textContent;
    $homeruns = $stats->item(9)->textContent;
    $walks = $stats->item(13)->textContent;
    $strikeouts = $stats->item(14)->textContent;

    $player = new Player([
        'name' => $name, 
        'age' => $age, 
        'plateAppearances' => $plateAppearances,
        'atBats' => $atBats,
        'hits' => $hits,
        'doubles' => $doubles,
        'triples' => $triples,
        'homeruns' => $homeruns,
        'walks' => $walks,
        'strikeouts' => $strikeouts
    ]);

    echo $player;
    echo '<br>';

}

这会检索我想要的所有属性，但仅会出现第一个玩家的19个实例（总行数），如下所示：

{"name":"Miguel Amaya","age":"19","plateAppearances":"241","atBats":"212","hits":"61","doubles":"14","triples":"2","homeruns":"9","walks":"24","strikeouts":"53"}

要检索表中的所有玩家而不是仅第一个玩家，并为每个玩家创建一个player模型，我该怎么做？

编辑/更新：添加了一些我要从中提取数据的表

<tr ><th scope="row" class="right " data-stat="ranker" >1</th><td class="left " data-append-csv="player.fcgi?id=amaya-000mig" data-stat="player" csk="Amaya,Miguel" ><a href="/register/player.fcgi?id=amaya-000mig">Miguel Amaya</a></td><td class="right " data-stat="age" >19</td><td class="right " data-stat="G" >59</td><td class="right " data-stat="PA" >241</td><td class="right " data-stat="AB" >212</td><td class="right " data-stat="R" >29</td><td class="right " data-stat="H" >61</td><td class="right " data-stat="2B" >14</td><td class="right " data-stat="3B" >2</td><td class="right " data-stat="HR" >9</td><td class="right " data-stat="RBI" >33</td><td class="right " data-stat="SB" >0</td><td class="right " data-stat="CS" >0</td><td class="right " data-stat="BB" >24</td><td class="right " data-stat="SO" >53</td><td class="right " data-stat="batting_avg" >.288</td><td class="right " data-stat="onbase_perc" >.365</td><td class="right " data-stat="slugging_perc" >.500</td><td class="right " data-stat="onbase_plus_slugging" >.865</td><td class="right " data-stat="TB" >106</td><td class="right " data-stat="GIDP" >3</td><td class="right " data-stat="HBP" >3</td><td class="right " data-stat="SH" >0</td><td class="right " data-stat="SF" >2</td><td class="right " data-stat="IBB" >2</td><td class="right " data-stat="notes" ></td></tr>

Answer 1

问题是$stats不能从循环的当前行中获取<td>元素。您将其设置为表中<td>元素的 all 。更改

$stats = $table->getElementsByTagName("td");

收件人：

$stats = $rows[$i]->getElementsByTagName("td");

然后摆脱使用items($i)的所有分配。 $i是$rows中的索引，与$stats无关。

此外，您需要跳过表中的标题行，因为它没有任何<td>元素。使用此方法仅获取<tbody>中的行，而跳过<thead>：

$rows = $table->getElementsByTagName("tbody")->item(0)->getElementsByTagName("tr");

Answer 2

每次通过时，在循环内将选择表的所有TD标签。您想要的是一次只扫描一行。我建议更改foreach上的循环类型，而不是将一行作为上下文，然后在该行上下文中查找唯一的“ TD”。代码不完整，但应遵循以下步骤：

$table = $dom->getElementByID('team_batting');
$rows = $table->getElementsByTagName("tr");
foreach($rows as $row){
  $cols=$row->getElementsByTagName("td");
  foreach($cols as $col){
    $type=$col->getAttribute('data-stat');
    if($type=='player') $name=$col->textContent;
    elseif(if($type=='age') $age=$col->textContent;
    ...
  }
  $player=new Player([
  ...
  ]);
}

这只是试图遵循您的样式的代码概述，但是列提取而不是在单独的变量上提取它们，可以通过将它们收集在关联数组上来更有效地完成。

为什么我的for循环仅抓取第一个元素？

2 个答案: