我需要帮助从网站使用PHP提取足球排名

时间:2016-07-26 15:23:57

标签: php html dom web-scraping html-table

我需要使用PHP获取HTML表。我该怎么办?

参考文献:

要解析的表:http://epsachaias.gr/?page_id=221&cat=4&group=151

我试过了:

parseMailingAddyImacroOutput

但是我收到以下错误:

  

致命错误:未找到班级“团队”。

然后我尝试添加此代码:

    <div class="row">
    <div class="col-md-8">
        <h1>Standings</h1>
        <?php 
            $html = file_get_contents('http://epsachaias.gr/?page_id=221&cat=4&group=206');

            $dom = new DOMDocument();
            $internalErrors = libxml_use_internal_errors(true);
            $dom->loadHTML($html);
            libxml_use_internal_errors($internalErrors);

            $tables = $dom->getElementsByTagName('table');
            $trs = $tables->item(1)->getElementsByTagName('tr');

            $output = [];
            for ($itr = 0; $itr < $trs->length; $itr++) 
            {
                $tds = $trs->item($itr)->getElementsByTagName('td');
                if ($tds->length == 22) 
                {
                    $output[] = new team($tds);
                }
            }

            var_dump($output);
        ?>

但我又得到了这个错误。

1 个答案:

答案 0 :(得分:1)

您必须知道此解决方案基于该网站的当前结构。

您可以使用DOMDocument对象来解析您可以通过file_get_contents获取的HTML,就像在此代码中一样

$html = file_get_contents('http://epsachaias.gr/?page_id=221&cat=4&group=151');

$dom = new DOMDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);

$tables = $dom->getElementsByTagName('table');
$trs = $tables->item(1)->getElementsByTagName('tr');

$output = [];
for ($itr = 0; $itr < $trs->length; $itr++) {
    $tds = $trs->item($itr)->getElementsByTagName('td');

    if ($tds->length == 22) {
        $row = [];
        for ($itd = 0; $itd < $tds->length; $itd++) {
            $row[] = $tds->item($itd)->textContent;
        }
        $output[] = $row;
    }
}

var_dump($output);

$output是你的最终数组。

如果要为解析后的数据提供更好的访问权限,则可以使用某些类和对象。例如,您可以为统计组准备一个类,为此团队准备另一个类,如此代码

class stats
{
    private $n;
    private $i;
    private $or;
    private $cplus;
    private $cminus;

    public function __construct($n, $i, $or, $cplus, $cminus)
    {
        $this->n = $n;
        $this->i = $i;
        $this->or = $or;
        $this->cplus = $cplus;
        $this->cminus = $cminus;
    }

    function getN()
    {
        return $this->n;
    }

    function getI()
    {
        return $this->i;
    }

    function getOr()
    {
        return $this->or;
    }

    function getCplus()
    {
        return $this->cplus;
    }

    function getCminus()
    {
        return $this->cminus;
    }
}

class team
{
    private $position;
    private $name;
    private $score;
    private $ag;
    private $dk;
    private $together;
    private $within;
    private $out;
    private $penalties;

    public function __construct(DOMNodeList $nodes)
    {
        if ($nodes->length == 22) {
            $this->position = (int) $nodes->item(0)->textContent;
            $this->name = $nodes->item(2)->textContent;
            $this->score = (int) $nodes->item(3)->textContent;
            $this->ag = (int) $nodes->item(4)->textContent;
            $this->dk = (int) $nodes->item(5)->textContent;

            $this->together = new stats((int)$nodes->item(6)->textContent, 
                                        (int)$nodes->item(7)->textContent, 
                                        (int)$nodes->item(8)->textContent, 
                                        (int)$nodes->item(9)->textContent, 
                                        (int)$nodes->item(10)->textContent);

            $this->within = new stats(  (int)$nodes->item(11)->textContent,
                                        (int)$nodes->item(12)->textContent, 
                                        (int)$nodes->item(13)->textContent, 
                                        (int)$nodes->item(14)->textContent, 
                                        (int)$nodes->item(15)->textContent);

            $this->out = new stats(     (int)$nodes->item(16)->textContent,
                                        (int)$nodes->item(17)->textContent, 
                                        (int)$nodes->item(18)->textContent, 
                                        (int)$nodes->item(19)->textContent, 
                                        (int)$nodes->item(20)->textContent);

            $this->penalties = (int) $nodes->item(21)->textContent;
        } else {
            throw new Exception("Incorrect input data");
        }
    }

    public function getPosition()
    {
        return $this->position;
    }

    public function getName()
    {
        return $this->name;
    }

    public function getScore()
    {
        return $this->score;
    }

    public function getAg()
    {
        return $this->ag;
    }

    public function getDk()
    {
        return $this->dk;
    }

    public function getTogether()
    {
        return $this->together;
    }

    public function getWithin()
    {
        return $this->within;
    }

    public function getOut()
    {
        return $this->out;
    }

    public function getPenalties()
    {
        return $this->penalties;
    }
}

因为我猜你只想读取解析数据,所以我只为所有对象属性提供了读操作。如果你想更改它们,那么你可以添加setter或公开属性(并删除不需要的方法)。

要使用已解析的数据预先创建集合,您可以使用此代码

$html = file_get_contents('http://epsachaias.gr/?page_id=221&cat=4&group=151');

$dom = new DOMDocument();
$internalErrors = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($internalErrors);

$tables = $dom->getElementsByTagName('table');
$trs = $tables->item(1)->getElementsByTagName('tr');

$output = [];
for ($itr = 0; $itr < $trs->length; $itr++) {
    $tds = $trs->item($itr)->getElementsByTagName('td');

    if ($tds->length == 22) {
        $output[] = new team($tds);
    }
}

var_dump($output);

$output之前你有一个带有值的简单数组 - 现在你有了一个对象集合。例如,如果您希望第二个团队的ΟΜΑΔΑ获得ΣΥΝΟΛΟ -> Ν单元格中的值,那么您只需使用此代码

echo "{$output[1]->getName()}: {$output[1]->getTogether()->getN()}";

在你的输出上你会得到

  

ΆνωΚαστρίτσι:21