在PHP中使用curl和xpath解析HTML页面

时间:2017-02-23 21:41:45

标签: php parsing curl xpath web-scraping

我需要解析此网页https://www.galliera.it/118获取彩色条下的数字。

这是我的代码(不起作用!!)......

Pair c d b

任何建议/示例/替代方案?

1 个答案:

答案 0 :(得分:1)

这是你的php脚本,它是你在排序很好的数组中的数据挖掘请求,你可以看到脚本的结果并根据需要改变结构。干杯!

$html = file_get_contents("https://www.galliera.it/118");

$dom = new DOMDocument();
$dom->loadHTML($html);
$finder = new DOMXPath($dom);

// find all divs class row
$rows = $finder->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' row ')]");

$data = array();
foreach ($rows as $row) {
    $groupName = $row->getElementsByTagName('h2')->item(0)->textContent;
    $data[$groupName] = array();

    // find all div class box
    $boxes = $finder->query("./*[contains(concat(' ', normalize-space(@class), ' '), ' box ')]", $row);
    foreach ($boxes as $box) {
        $subgroupName = $box->getElementsByTagName('h3')->item(0)->textContent;
        $data[$groupName][$subgroupName] = array();

        $listItems = $box->getElementsByTagName('li');
        foreach ($listItems as $k => $li) {

            $class = $li->getAttribute('class');
            $text = $li->textContent;

            if (!strlen(trim($text))) {
                // this should be the graph bar so kip it
                continue;
            }

            // I see only integer numbers so I cast to int, otherwise you can change the type or event not cast it
            $data[$groupName][$subgroupName][] = array('type' => $class, 'value' => (int) $text);
        }
    }
}

echo '<pre>' . print_r($data, true) . '</pre>';

和输出类似于:

Array
(
    [SAN MARTINO - 15:30] => Array
        (
            [ATTESA: 22] => Array
                (
                    [0] => Array
                        (
                            [type] => rosso
                            [value] => 1
                        )

                    [1] => Array
                        (
                            [type] => giallo
                            [value] => 12
                        )

                    [2] => Array
                        (
                            [type] => verde
                            [value] => 7
                        )

                    [3] => Array
                        (
                            [type] => bianco
                            [value] => 2
                        )

                )

            [VISITA: 45] => Array
                (
                    [0] => Array
                        (
                            [type] => rosso
                            [value] => 5
                        )
...