I am trying to scrape the 538 baseball odds site.
When I paste the URL into Chrome and view source, it looks something like standard HTML.
When I scrape the data (I have used both the code below and file_get_contents
with the same results) I get something that looks like:
��}ks�8����������� >��j�[�ߔ8��
I have tried the code on simpler sites without issue. Is the site somehow blocking my get?
<?php
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
$output = curl('https://projects.fivethirtyeight.com/2017-mlb-predictions/games/');
echo $output;
?>
答案 0 :(得分:1)
config CURLOPT_ENCODING
for curl, then it will be OK.
<?php
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
curl_setopt($ch, CURLOPT_ENCODING ,"");
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
$output = curl('https://projects.fivethirtyeight.com/2017-mlb-predictions/games/');
echo $output;
?>