使用Simple_HTML_Dom的结果错误

时间:2009-07-25 03:49:12

标签: php screen-scraping

我正在尝试抓取此网页:http://www.acttab.com.au/interbet/venues?day=today

这是我的代码:

function FindRaceRows($html) {
    foreach ($rows = $html->find(
        'tr[bgcolor="#ffffff"], tr[bgcolor="#cccccc"]') as
        $row);
        {
        echo $row->plaintext . "END ROW<br />\n";

        foreach ($row->find('td[align=center]') as $cell) {

            //echo $cell->bgcolor;

            //black
            if ($cell->bgcolor == "#000000") {
                echo "Already run";
            }

            //blue
            if ($cell->bgcolor == "#0000ff") {
                echo "Next race for type";
            }

            //green
            if ($cell->bgcolor == "#00cc00") {
                echo "Still to jump";
            }

            //Red
            if ($cell->bgcolor == "#cc0000") {
                echo "Next race for meeting";
            }

            foreach ($cell->find('a') as $tag); {
                $link = $tag->href;

                $eventIx = strpos($link, "mting=");

                if ($eventIx != -1) {
                    $event = substr($link, $eventIx + 6);
                    //echo $event."<br />\n";
                    $url =
                        "http://www.acttab.com.au/interbet/odds?mting="
                        . $event;

                    echo $url . "<br />\n";
                }
            }
        }
    }
}

$url = "http://www.acttab.com.au/interbet/venues?day=today";
$html = file_get_html($url);

FindRaceRows($html);

但它不是分开每一行。我在行变量中得到了一大堆行。

以下是一些输出:(注意每行末尾没有出现“END ROW”)

AR MORPHETTVILLE FINE/DEAD R2@ 1:10pm 1 2 3 4 5 6 7 8   BR DOOMBEN FINE/GOOD R3@ 1:30pm 1 2 3 4 5 6 7 8   CR TOOWOOMBA FINE/GOOD R1@ 5:08pm 1 2 3 4 5 6 7   CT OTAKI NZ FINE/HVY R8@ 1:01pm 1 2 3 4 5 6 7 8 9 10   DR TOWNSVILLE FINE/GOOD R3@ 1:15pm 1 2 3 4 5 6 7 8   DT TE RAPA NZ FINE/SLOW R6@ 1:15pm 1 2 3 4 5 6 7 8   MR MOONEE VALLEY OCAST/DEAD R2@ 1:05pm 1 2 3 4 5 6 7 8   NR NEWCASTLE FINE/SLOW R3@ 1:35pm 1 2 3 4 5 6 7 8   SR RANDWICK FINE/HVY R3@ 1:20pm 1 2 3 4 5 6 7 8   VR DONALD FINE/DEAD R3@ 1:25pm 1 2 3 4 5 6 7 8   XR BELMONT FINE/DEAD R1@ 2:25pm 1 2 3 4 5 6 7 8     HARNESS MEETINGS AT GLOBE DERBY FINE/GOOD R1@ 6:13pm 1 2 3 4 5 6 7 8 9 10   BT ALBION PARK FINE/GOOD R1@ 5:23pm 1 2 3 4 5 6 7 8 9 10   MT BALLARAT OCAST/GOOD R1@ 7:02pm 1 2 3 4 5 6 7 8   NT PARKES FINE/FAST R1@ 5:12pm 1 2 3 4 5 6   ST NEWCASTLE FINE/FAST R1@ 6:35pm 1 2 3 4 5 6 7 8   XT GLOUCESTER PARK FINE/GOOD R1@ 8:45pm 1 2 3 4 5     GREYHOUND MEETINGS MD THE MEADOWS FINE/GOOD R1@ 7:20pm 1 2 3 4 5 6 7 8 9 10 11   ND THE GARDENS FINE/GOOD R1@ 5:04pm 1 2 3 4 5 6 7 8   SD WENTWORTH PARK FINE/GOOD R1@ 7:27pm 1 2 3 4 5 6 7 8 9 10   XD CANNINGTON FINE/GOOD R1@ 9:05pm 1 2 3 4 5 6   END ROW
`http://www.acttab.com.au/interbet/odds?mting=XD06000`
BR DOOMBEN FINE/GOOD R3@ 1:30pm 1 2 3 4 5 6 7 8   CR TOOWOOMBA FINE/GOOD R1@ 5:08pm 1 2 3 4 5 6 7   CT OTAKI NZ FINE/HVY R8@ 1:01pm 1 2 3 4 5 6 7 8 9 10   DR TOWNSVILLE FINE/GOOD R3@ 1:15pm 1 2 3 4 5 6 7 8   DT TE RAPA NZ FINE/SLOW R6@ 1:15pm 1 2 3 4 5 6 7 8   MR MOONEE VALLEY OCAST/DEAD R2@ 1:05pm 1 2 3 4 5 6 7 8   NR NEWCASTLE FINE/SLOW R3@ 1:35pm 1 2 3 4 5 6 7 8   SR RANDWICK FINE/HVY R3@ 1:20pm 1 2 3 4 5 6 7 8   VR DONALD FINE/DEAD R3@ 1:25pm 1 2 3 4 5 6 7 8   XR BELMONT FINE/DEAD R1@ 2:25pm 1 2 3 4 5 6 7 8     HARNESS MEETINGS AT GLOBE DERBY FINE/GOOD R1@ 6:13pm 1 2 3 4 5 6 7 8 9 10   BT ALBION PARK FINE/GOOD R1@ 5:23pm 1 2 3 4 5 6 7 8 9 10   MT BALLARAT OCAST/GOOD R1@ 7:02pm 1 2 3 4 5 6 7 8   NT PARKES FINE/FAST R1@ 5:12pm 1 2 3 4 5 6   ST NEWCASTLE FINE/FAST R1@ 6:35pm 1 2 3 4 5 6 7 8   XT GLOUCESTER PARK FINE/GOOD R1@ 8:45pm 1 2 3 4 5     GREYHOUND MEETINGS MD THE MEADOWS FINE/GOOD R1@ 7:20pm 1 2 3 4 5 6 7 8 9 10 11   ND THE GARDENS FINE/GOOD R1@ 5:04pm 1 2 3 4 5 6 7 8   SD WENTWORTH PARK FINE/GOOD R1@ 7:27pm 1 2 3 4 5 6 7 8 9 10   XD CANNINGTON FINE/GOOD R1@ 9:05pm 1 2 3 4 5 6   END ROW

1 个答案:

答案 0 :(得分:0)

问题不在于Simple_HTML_Dom,而在于您的代码。

您在两个;声明之后放置了分号(foreach)。

使用foreach时,不得在声明后添加分号(;)。请考虑以下示例:

$array = array(1,2,3,4);
foreach($array as $value) {
    echo $value;
}

以上输出为"1234"

现在让我们看看如果我们在声明之后加上分号(;)会发生什么:

$array = array(1,2,3,4);
foreach($array as $value); {
    echo $value;
}

以上输出为"4"。原因是PHP将执行你的循环,但不会进入大括号。它将在循环处理完毕后进行处理,$value将保留循环中可用的最后一个值。