我在此网址http://www.magicbricks.com/bricks/agentSearch.html上使用simpleTest WebBrowser进行DataScraping。但是,尽管一切似乎都是正确的,我总是得到错误City Field is required
。我想问题可能在于,当State的值发生变化时,city字段中的值会动态变化。有解决方案吗这是我的代码。
<?php
require_once('simpletest/browser.php');
$browser = &new SimpleBrowser();
$browser->addHeader('User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2');
$browser->get('http://www.magicbricks.com/bricks/agentSearch.html');
$browser->setField('source','agentSearch');
$browser->setField('_transactionType','1');
$browser->setField('_propertyType','1');
$browser->setField('resultPerPage','50');
$browser->setField('agentSearchType','B');
$browser->setField('state','520');
$browser->setField('city','4320');
$browser->setField('keyword','');
$browser->setField('country','50');
print $browser->submitFormById('searchFormBean');
print $browser->getResponseCode()
?>
答案 0 :(得分:0)
以下是我注意到的一些错误
字段缺失
您需要添加一些标题信息,例如
如果查看标题
,典型的帖子测试应采用此格式 POST http://www.magicbricks.com/bricks/agentSearch.html HTTP/1.1
Host: www.magicbricks.com
Connection: keep-alive
Content-Length: 173
Cache-Control: max-age=0
Origin: http://www.magicbricks.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.79 Safari/535.11
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: http://www.magicbricks.com/bricks/agentSearch.html
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: JSESSIONID=nF1UqV3DM2tZC42zByYm6Q**.MBAPP09; __utma=163479907.1423216630.1331970312.1331970312.1331970312.1; __utmb=163479907.1.10.1331970312; __utmc=163479907; __utmz=163479907.1331970312.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); _mbRunstats=3k0ilrpcgprh4tea
source=agentSearch&agentSearchType=B&country=51&state=601&city=8417&transactionType=11951&_transactionType=1&propertyType=10001&_propertyType=1&keyword=tesy&resultPerPage=50
我希望这会有所帮助
:D