我正在使用以下代码从html页面抓取数据:
<?php
$url = 'http://www.atletiek.co.za/atletiek.co.za/uitslae/2016ASASASeniors/160415F012.htm';
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($handle);
libxml_use_internal_errors(true); // Prevent HTML errors from displaying
$doc = new DOMDocument();
$doc->loadHTML($html); // get the DOM
$xpath = new DOMXPath($doc); // start a new xPath on our DOM Object
$preBlock = $xpath->query('//pre'); // find all pre (we only got one here)
// get the first of all the pre objects
// get the 'inner value'
// split them by newlines
$preBlockString = explode("\n",$preBlock->item(0)->nodeValue);
$startResultBlock = false;
$i = 0;
// traverse all rows
foreach ($preBlockString as $line){
// if we found the 'Name' marker within the last row start fetching the results
if($startResultBlock){
$result = explode(' ', $line);
// kill all empty entries (originating from all the space characters)
foreach ($result as $key => $value) if (empty($value)) unset($result[$key]);
$results[] = $result;
// my first idea to use list does not work because of all the space characters
// list($results[$i]['number'], $results[$i]['name'], $results[$i]['age'], $results[$i]['team'], $results[$i]['finals'], $results[$i]['wind'], $results[$i]['points']) = explode(" ", $line);
$i++;
}
// if we found the word 'Name' we set a marker for the upcoming rows
if(trim($line) == 'Finals'){
$startResultBlock = true;
}
}
var_dump($results);
?>
输出如下:
array(43) { [0]=> array(7) { [2]=> string(1) "1" [3]=> string(7) "Stephen" [4]=> string(6) "Mokoka" [16]=> string(2) "31" [17]=> string(3) "Agn" [36]=> string(8) "13:40.81" [40]=> string(1) "8" } [1]=> array(7) { [2]=> string(1) "2" [3]=> string(5) "Elroy" [4]=> string(6) "Gelant" [18]=> string(2) "30" [19]=> string(4) "Acnw" [37]=> string(8) "13:43.43" [41]=> string(1) "7" } [2]=> array(7) { [2]=> string(1) "3" [3]=> string(8) "Sibusiso" [4]=> string(5) "Nzima" [16]=> string(2) "30" [17]=> string(3) "Cga" [36]=> string(8) "13:46.73" [40]=> string(1) "6" }
我正在尝试对所有内容重新编号,以便它显示如下:
array(43) { [0]=> array(7) { [2]=> string(1) "1" [3]=> string(7) "Stephen" [4]=> string(6) "Mokoka" [5]=> string(2) "31" [6]=> string(3) "Agn" [7]=> string(8) "13:40.81" [8]=> string(1) "8" } [1]=> array(7) { [2]=> string(1) "2" [3]=> string(5) "Elroy" [4]=> string(6) "Gelant" [5]=> string(2) "30" [6]=> string(4) "Acnw" [7]=> string(8) "13:43.43" [8]=> string(1) "7" } [2]=> array(7) { [2]=> string(1) "3" [3]=> string(8) "Sibusiso" [4]=> string(5) "Nzima" [5]=> string(2) "30" [6]=> string(3) "Cga" [7]=> string(8) "13:46.73" [8]=> string(1) "6" }
我尝试了各种各样的东西,但它不断像实例1那样踢出数据。有没有人有任何想法我怎么能重新编号呢?或者,如果它可以从0/1开始并按顺序分配新号码。
答案 0 :(得分:1)
<?php
$url = 'http://www.atletiek.co.za/atletiek.co.za/uitslae/2016ASASASeniors/160415F012.htm';
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($handle);
libxml_use_internal_errors(true); // Prevent HTML errors from displaying
$doc = new DOMDocument();
$doc->loadHTML($html); // get the DOM
$xpath = new DOMXPath($doc); // start a new xPath on our DOM Object
$preBlock = $xpath->query('//pre'); // find all pre (we only got one here)
// get the first of all the pre objects
// get the 'inner value'
// split them by newlines
$preBlockString = explode("\n",$preBlock->item(0)->nodeValue);
$startResultBlock = false;
$i = 0;
// traverse all rows
foreach ($preBlockString as $line){
// if we found the 'Name' marker within the last row start fetching the results
if($startResultBlock){
$result = explode(' ', $line);
// kill all empty entries (originating from all the space characters)
foreach ($result as $key => $value) if (empty($value)) unset($result[$key]);
$results[] = $result;
// my first idea to use list does not work because of all the space characters
// list($results[$i]['number'], $results[$i]['name'], $results[$i]['age'], $results[$i]['team'], $results[$i]['finals'], $results[$i]['wind'], $results[$i]['points']) = explode(" ", $line);
$i++;
}
// if we found the word 'Name' we set a marker for the upcoming rows
if(trim($line) == 'Finals'){
$startResultBlock = true;
}
}
/* This will reorder your array */
$newResult = [];
foreach ($results as $result)
{
$result = array_values($result);
array_unshift($result, '');
unset($result[0]);
$newResult[] = $result;
}
$results = $newResult;
var_dump($results);