文档解析 - 从公共网站e.x flashscore.com获取数据

时间:2016-03-20 11:32:33

标签: php parsing dom web-scraping

我想废弃所有的比赛和联赛dom,或者simple_html_dom我有一些代码用于此

<?php
 $html = file_get_contents('http://www.flashscore.com/soccer/france/ligue-1/'); //get the html returned from the following url

 $pokemon_doc = new DOMDocument();

libxml_use_internal_errors(TRUE); //disable libxml errors

if(!empty($html)){ //if any html is actually returned

$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html

$pokemon_xpath = new DOMXPath($pokemon_doc);

//get all the h2's with an id
$pokemon_row = $pokemon_xpath->query('//td[@id]');

if($pokemon_row->length > 0){
    foreach($pokemon_row as $row){
        echo $row->nodeValue . "<br/>";
     }
  }
}
?>

并且在此代码中我没有得到任何匹配结果,对此有任何想法吗?

1 个答案:

答案 0 :(得分:-1)

<?php

//包含我们的标签抓取类

require("scrab.php"); // class for spider

// Enter the URL you want to run

$urlrun="http://www.livescore.com/";


// Specify the start and end tags you want to grab data between
$stag="<a href=";
$etag="</a>";
$stag="<div class=";
$etag="</div>";

// Make a title spider
$tspider = new tagSpider();

// Pass URL to the fetch page function
$tspider->fetchPage($urlrun);

// Enter the tags into the parse array function
$linkarray = $tspider->parse_array($stag, $etag);

echo "<h2>Links present on page: ".$urlrun."</h2><br />";
// Loop to pump out the results
foreach ($linkarray as $result) {

echo $result;

echo "<br/>";
}

?>