Question

我想用来自Soccerstats.com的Php Simple Dom解析器抓取一些数据，但是我不能，因为总是在加载普通页面之前出现cookie页面。如何绕过Cookie页面？我的代码是这样的：

<?php
    include_once('../scrapper/scrapper.php');
    $url = 'https://www.soccerstats.com/matches.asp';
    $html = file_get_html($url);

    $stats = array();
    foreach($html->find('table') as $table) {
        $stats[] = $table->outertext;
    }
    $results = implode(",", $stats);    

    echo $results; 
?>

Answer 1

快速浏览页面https://www.soccerstats.com/matches.asp可以发现，“ cookie页面”的真正作用是，它要求用户单击一个按钮，该按钮在被单击时仅设置一个cookie { {1}}的值为cookiesok ，如该页面的源中所示：

yes

因此，我们需要做的是使PHP能够使用此cookie集来获取页面。

由于您使用的是https://sourceforge.net/projects/simplehtmldom/库及其函数<button class="button button3" onclick=" setCookielocal('cookiesok', 'yes', 365)"><font size='4'>I agree. Continue to website.</font></button>，因此我研究了该函数的源代码，发现它确实在后台使用了file_get_contents() function-同时，它还允许我们传递自己的“上下文”，可以通过stream_context_create() function来创建它。

简而言之， file_get_html()允许我们创建一个上下文，其中包含要在stream_context_create()函数中使用的必需的 cookies 。

最终代码：

file_get_html()

在PHP中抓取页面

1 个答案: