PHP cURL刮掉ASPX没有显示数据

时间:2014-05-02 06:39:24

标签: php asp.net ajax curl

我试图用php cURL抓取此页面:http://www.newhorizonssc.com/localweb/catalog/coursecatalog.aspx?GroupId=402&keyword=infopath

然而,当我通过cURL运行该url并回显结果时,我得到了页面的轮廓,但是页面中间的数据表丢失了。

在浏览器中转到url的结果: Good result

通过cURL运行网址后的结果: Bad results

然而,当我在Firebug中查看请求的HTML页面时,我看到的空白区域的结果也是我所看到的(所以如果匹配它,我的标题可能会很好吗?): headers

显然,当它没有显示时,我无法从表格中删除数据。

我一整天都在尝试,通过这里提出的问题,链接的教程,谷歌。很明显,使用php cURL和aspx表单访问数据并不是最简单的,但是没有任何工作。

首先我会认为,因为我可以添加"?GroupId = 402& keyword = infopath"到URL的末尾,一个简单的GET就可以了。但是,由于它不是,我认为必须进行某种验证或正在进行的事情。

我非常确定我拥有所有正确的标头信息。但是我注意到在那个吐出好结果的页面上,有24个XHR请求,在我的页面上有cURL,有0个。我想我不知怎的,我应该做一个AJAX调用提起那张桌子,但我很失落如何做到这一点 ---此外,如果我确实要显示此表,我还需要进行ajax调用以模拟单击小加按钮,这将进行ajax调用并显示每个课程下的类列表。

以下是我使用的整个cURL函数:

private function __curl($url) {    
                    $nameCourseSearch='ctl00$uxContentBody$txtSearch';
                    $valCourseSearch = 'infopath';
                    $nameSearchBtn = 'ctl00$uxContentBody$btnSearch';
                    $valSearchBtn = 'GO';


                    // the path to a file we can read/write; this will
                    // store cookies we need for accessing secured pages
                    $cookieFile = 'cookie.txt';

                    // regular expressions to parse out the special ASP.NET
                    // values for __VIEWSTATE and __EVENTVALIDATION
                    $regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
                    $regexEventVal  = '/__EVENTVALIDATION\" value=\"(.*)\"/i';


                    /************************************************
                    * utility function: regexExtract
                    *    use the given regular expression to extract
                    *    a value from the given text;  $regs will
                    *    be set to an array of all group values
                    *    (assuming a match) and the nthValue item
                    *    from the array is returned as a string
                    ************************************************/
                    function regexExtract($text, $regex, $regs, $nthValue)
                    {
                    if (preg_match($regex, $text, $regs)) {
                     $result = $regs[$nthValue];
                    }
                    else {
                     $result = "";
                    }
                    return $result;
                    }


        $ch = curl_init();


                    /************************************************
                    * first, issue a GET call to the ASP.NET login
                    *   page.  This is necessary to retrieve the
                    *   __VIEWSTATE and __EVENTVALIDATION values
                    *   that the server issues
                    ************************************************/
                    curl_setopt($ch, CURLOPT_URL, $url);
                    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
                    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
                    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
                    $data=curl_exec($ch);

                    // from the returned html, parse out the __VIEWSTATE and
                    // __EVENTVALIDATION values
                $viewstate = regexExtract($data,$regexViewstate,$regs,1);
                $eventval = regexExtract($data, $regexEventVal,$regs,1);

         $postData =  array(
             '__VIEWSTATE'=>rawurlencode($viewstate),
          '__EVENTVALIDATION'=>rawurlencode($eventval),
             'ctl00_ContentPlaceHolder1_tc1_ClientState' => '{"ActiveTabIndex":0,"TabState":[true,true]}',                          $nameCourseSearch =>$valCourseSearch,
                 $nameSearchBtn =>$valSearchBtn,
                    );           

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/4");
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        curl_setopt($ch, CURLOPT_FAILONERROR, true);
        //testing asp options
                curl_setOpt($ch, CURLOPT_POST, TRUE);
                curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postData));
                curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
                $data = curl_exec($ch);

                /************************************************
                * with the authentication cookie in the jar,
                * we'll now issue a GET to the secured page;
                * we set curl's COOKIEFILE option to the same
                * file we used for the jar before to ensure the
                * authentication cookie is sent back to the
                * server
                ************************************************/
                curl_setOpt($ch, CURLOPT_POST, FALSE);
                curl_setopt($ch, CURLOPT_URL, $url);   
                curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile);     

                $data = curl_exec($ch);

        //$result = curl_exec($ch);
        if(!$data) {
            echo "<br />cURL error number: ".curl_errno($ch);
            echo "<br />cURL erro: ".curl_error($ch). " on URL - ". $url;
            var_dump(curl_getinfo($ch));
            var_dump(curl_error($ch));
            exit;
        }

        return $data;
    }

抱歉这么长时间。只是想确保我发布了我认为需要的所有信息。

感谢。

0 个答案:

没有答案