Question

I am trying to scrape the 538 baseball odds site.

When I paste the URL into Chrome and view source, it looks something like standard HTML.

When I scrape the data (I have used both the code below and file_get_contents with the same results) I get something that looks like: ��}ks�8�� >��j�[�ߔ8��

I have tried the code on simpler sites without issue. Is the site somehow blocking my get?

<?php

function curl($url) {
    $ch = curl_init();  // Initialising cURL
    curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}

$output = curl('https://projects.fivethirtyeight.com/2017-mlb-predictions/games/');
echo $output;

?>

Answer 1

config CURLOPT_ENCODING for curl, then it will be OK.

<?php

function curl($url) {
    $ch = curl_init();  // Initialising cURL
    curl_setopt($ch, CURLOPT_URL, $url);    // Setting cURL's URL option with the $url variable passed into the function
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
    curl_setopt($ch, CURLOPT_ENCODING ,"");
    $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
    curl_close($ch);    // Closing cURL
    return $data;   // Returning the data from the function
}

$output = curl('https://projects.fivethirtyeight.com/2017-mlb-predictions/games/');
echo $output;

?>

CURL is returning junk

1 个答案: