PHP:有没有更快的方式来获取网页的HTML内容?

时间:2016-07-26 16:04:22

标签: php optimization

我正在创建网页的解析器。我正在处理的是非常慢的计算。有没有办法让这段代码更快?你建议我做什么?我认为最慢的因素是CUrl请求需要很长时间才能完成。

以下是代码:

// Parsing the soccer:
function parseSoccer() {

// Seven days + one extra:
for($i = 0; $i <= PARSE_DAYS; $i++) {
    // Starting webpage
    $base = "http://sports.williamhill.com/bet/en-gb/betting/y/5/tm/" . $i . "/Football.html";

    // Loading the starting webpage:
    $html = loadHtml($base);

    foreach($html->find('div#ip_sport_0_types') as $ip_sport_0_types) {

        // Finding league names and links:
        foreach($ip_sport_0_types->find('h3') as $h3) {

            // Link and URL of the league:
            $leagueLink = $h3->find('a')[0];
            $leagueUrl = $leagueLink->href;

            // League name:
            $leagueName = $leagueLink->innertext;

            // League page:
            $leagueHtml = loadHtml($leagueLink->href);

            // Upcoming matches headers:
            $upcomingHeaders = $leagueHtml->find('#upcomingHeader');

            // If there are some upcoming matches:
            if(count($upcomingHeaders) > 0) {

                // Finding the rows with the odds of the match:
                $rowOdds = $upcomingHeaders[0]->parent()->find('tr.rowOdd');

                // Finding the match link:
                foreach ($rowOdds as $rowOdd) {


                    // Data which will be sent to the endpoint:
                    $data = array();
                    $data['timestamp'] = time();
                    $m = array();



                    // This is the match link:
                    $matchLink = $rowOdd->find('td')[2]->find('a')[0];
                    $matchUrl = $matchLink->href;

                    // Match page:
                    $matchHtml = loadHtml($matchUrl);


                    // Parsing match date: ---- Considering the fact that match starts on a Bet until date:
                    $matchDate = trim($matchHtml->find('span#eventDetailsHeader')[0]->find('span')[0]->innertext);
                    $matchDate = explode(' : ', $matchDate);
                    $matchDate = trim($matchDate[1]);
                    $matchDate = explode(' ', $matchDate);
                    $matchDate = date("m", strtotime($matchDate[1])) . "/" . $matchDate[0] . "/" . date("Y") . " " . str_replace("-", "", $matchDate[3]);
                    $matchDate = strtotime($matchDate);
                    $matchDate = date("Y-m-d H:i:s", $matchDate);

                    // Parsing the teams and the odds:
                    // Team A (home team):
                    $teamAHolder = $matchHtml->find('div.eventpriceholder-left')[0];
                    $teamAOdds = trim($teamAHolder->find('div.eventprice')[0]->innertext);
                    $teamAName = trim($teamAHolder->find('div.eventselection')[0]->innertext);
                    if(strpos($teamAOdds, "/") == -1) break;
                    eval('$teamAOdds = (' . $teamAOdds . ');');
                    $teamAOdds = number_format($teamAOdds, 2);

                    // Draw:
                    $drawHolder = $matchHtml->find('div.eventpriceholder-left')[1];
                    $drawOdds = trim($drawHolder->find('div.eventprice')[0]->innertext);
                    if(strpos($teamAOdds, "/") == -1) break;
                    eval('$drawOdds = (' . $drawOdds . ');');
                    $drawOdds = number_format($drawOdds, 2);

                    // Team B:
                    $teamBHolder = $matchHtml->find('div.eventpriceholder-right')[0];
                    $teamBOdds = trim($teamBHolder->find('div.eventprice')[0]->innertext);
                    $teamBName = trim($teamBHolder->find('div.eventselection')[0]->innertext);
                    if(strpos($teamAOdds, "/") == -1) break;
                    eval('$teamBOdds = (' . $teamBOdds . ');');
                    $teamBOdds = number_format($teamBOdds, 2);


                    // Storing data into variables:
                    $m["match_url"] = $matchUrl;
                    $m["match_page_url"] = $leagueUrl;
                    $m["match_name"] = $teamAName . " vs " . $teamBName;
                    $m["match_league"] = $leagueName;
                    $m["date_start"] = $matchDate;
                    $m["match_type"] = "Soccer";

                    $m["market_name"] = "Match Result";
                    $m["is_main_market"] = 1;
                    $m["selection_1_name"] = $teamAName;
                    $m["selection_2_name"] = "Draw";
                    $m["selection_3_name"] = $teamBName;
                    $m["selection_1"] = $teamAOdds;
                    $m["selection_2"] = $drawOdds;
                    $m["selection_3"] = $teamBOdds;
                    $data["matches"][] = $m;

                    print_r($data);

                    echo "\n\n-------------" . $i . "-----------\n\n";

                    unset($matchHtml);

                }

            }

            unset($leagueHtml);

        }


    }

    unset($html);
}
}



/**
  * Returns simple_html_dom object.
  */
 function loadHtml($url) {
     $curl = curl_init();
     curl_setopt($curl, CURLOPT_URL, $url);
     curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
     $str = curl_exec($curl);
     curl_close($curl);


     $html = new simple_html_dom();

     $html->load($str);
     return $html;
}

0 个答案:

没有答案