我想从以下页面获取数据:http://kovv.mavari.be/kalender.aspx当您按下提交按钮且下拉列表中没有选定值时。 (那么你看到一张大桌子的页面)
我已尝试按照您在此处找到的教程:http://www.mishainthecloud.com/2009/12/screen-scraping-aspnet-application-in.html。
这是我到目前为止所做的:
public function teamsoostVlaanderen()
{
$url = "http://kovv.mavari.be/kalender.aspx";
$regs=array();
$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';
// regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"(.*)\"/i';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);
$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht'
.'&ctl00$ContentPlaceHolder1$ddlReeks'
.'&ctl00_ContentPlaceHolder1_ddlDatum'
.'&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;
curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
$data = curl_exec($ch);
echo $data;
curl_close($ch);
die();
}
public function regexExtract($text, $regex, $regs, $nthValue)
{
if (preg_match($regex, $text, $regs)) {
$result = $regs[$nthValue];
}
else {
$result = "";
}
return $result;
}
但我仍然没有帖子的页面(所以没有表格)。当我检查我的cookies.txt文件时它是空的,也许有问题吗?有人可以帮我找到问题吗?
答案 0 :(得分:1)
适当的正则表达式:
$regexViewstate = '/__VIEWSTATE\" value=\"([^"]*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"([^"]*)\"/i';
错过帖子参数中的等号:
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht='
.'&ctl00$ContentPlaceHolder1$ddlReeks='
.'&ctl00_ContentPlaceHolder1_ddlDatum='
.'&ctl00$ContentPlaceHolder1$btnZoek=zoek'