我正在尝试学习网页抓取。 我需要从此页面获取所有网址 - http://www.99acres.com/rent-property-in-chennai-ffid?
首先,我需要先对最新的条目进行排序,然后在我的代码中复制getresults_ajax POST请求。即使Chrome控制台中的xpath返回有效结果,我的代码中也会出现一个空列表。
我知道复制请求可能很乏味,我使用Selenium和PhantomJS来抓取动态页面,但是我需要对内容进行排序,然后从响应中获取数据,这似乎很棘手。
我的代码:
d = {
'src': 'SORTING_date_d',
'static_search': 'true',
'': 'undefined',
'sortby': 'date_d',
'lstAcnId': '8930791340597402',
'encrypted_input': 'UiB8IFFTIHwgUiB8IzIjICB8IGNoZW5uYWkgIzMjfCAgfCBDUDMyIzIyIyB8IDI1MTU3NTg2IHwgIHwgMzIgfCM1IyAgfCBSICM0MCN8ICA=',
'lstAcn': 'SEARCH',
'is_ajax': '1'
}
h = {
'Referrer': 'http://www.99acres.com/rent-property-in-chennai-ffid?orig_property_type=R&search_type=QS&search_location=CP32&pageid=QS&keyword_orig=chennai'
}
req = requests.post(url = 'http://www.99acres.com/do/quicksearch/getresults_ajax', data = d, headers = h)
r = html.fromstring(req.text)
#print('test 1' + str(req.text))
prices = r.xpath('//div[@title = "View property details"]')
print('test %d' % len(prices))
# driver = webdriver.PhantomJS(executable_path = R'C:\Python27\selenium\webdriver\phantomjs-2.1.1-windows\bin\phantomjs.exe')
for price in prices:
print('price is this ' + str(price))
答案 0 :(得分:1)
如果你打印文本,你会发现它是一个json响应:
{"html_ysf":" <div class=\"srp-ysfWrap boxSize\">\n\n\n\n <diV. etc.............
所以要获得你想要的东西,只需使用 html2键提取有趣的html:
req = requests.post(url='http://www.99acres.com/do/quicksearch/getresults_ajax', data=d, headers=h)
r = html.fromstring((req.json()["html2"]))
prices = r.xpath('//div[@title = "View property details"]')
print('test %d' % len(prices))
for price in prices:
print('price is this ' + str(price))
每个价格都是div元素,所以如果我们运行:
for price in prices:
print(html.tostring(price))
我们得到如下输出:
b'<div data-propid="Q26021619" data-pgid="QS" class="srpWrap " title="View property details" data-fsl="N">\n\t\t<input id="ajxPDFlg" type="hidden" value="najx">\n <input id="dataSRPCLKTRK" type="hidden" value="ON">\n <i class="uiIcon pLatinum"></i>\t\t<div class="wrapttl">\n\t\t\t<div class="_srpttl srpttl fwn wdthFix480 lf">\n <b class="WebRupee f14 mr5"> ₹</b> <b id="rs_Q26021619">18,000</b>\n <a data-proppos="\'\'" id="desc_Q26021619" class="b wWrap" target="_blank" title="2 BHK, Residential Apartment for rent in Choolaimedu" href="/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-spid-Q26021619" data-fsl="N">2 BHK, Residential Apartment for rent in Choolaimedu</a> </div>\n <i class="uline" data-maplatlngzm="13.06709,80.2195432,11" data-iwdesc=" Residential Apartment for rent in Choolaimedu" data-ttlurl="http://www.99acres.com/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-spid-Q26021619" data-price="18,000," data-area="Super built-up ,1000,Sq.Ft." data-bedrm="2" data-bldname="On Request" title="View Map"><i class="uiIcon imap"></i><i class="ml_5 f13 vmid hverU">Map</i></i> <div class="clr"></div>\n\t\t</div>\n \n \n\t\t<div class="srpDetail">\n\t\t\t<div class="srpImg rel">\n <img class="imgBoxSrp lazy" alt="2 BHK, Residential Apartment for rent in Choolaimedu" width="208" height="150" data-original="http://static.99acres.com/images/srpimages/noproperty-new.png" src="http://static.99acres.com/images/i0.gif"><div class="imgCap" data-clk-json=\'{"sno":-1,"ids":"0;732;","phType":"PROP","index":0,"text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'><a class="trackVamRos" vamacttype="Locality_Video_Count" vamactsrc="RENT_SRP" data-trkctgry="CLICK_LOCALITY_VIDEO_LINK" data-blid="732" href="#" data-clk-json=\'{"vtag":"LOC","sno":-1,"tab":4,"ids":"0;732;","phType":"PROP","entity":"locimages","subtab":"LVIDEO","text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'>1 Locality Video</a><div class="clr"></div></div>\t\t\t</div>\n\t\t\t<div class="srpDataWrap"><span>Super built-up Area : <b>1000 Sq.Ft. </b></span><div class="clr pdt8"></div><span class="doElip">Society : <bclass>On Request</bclass></span><div class="sep clr mt3imp"></div><span><span>Highlights:  </span> <span>On Rent </span><span> <span>/ </span> 1 to 5 years old </span><span> <span>/ </span> Unfurnished </span><span> <span>/ </span> 2nd Floor (out of 3) </span></span><div class="sep clr"></div>\t\t\t\t<div class="lf f12 wBr">\n\t\t\t\t\t<b>Description :</b> \n Near gandhi road\nGood locality, Calm atmosphere\nCall for more details\t\t\t\t</div>\n <div class="rel clr">\n <div class="lf mt13 mr13">Features: </div>\n <div class="iconDiv fc_icons fcInit" attr="4,5,24,">\n <i class="i4" value="Reserved Parking"> </i><i class="i5" value="Feng Shui / Vaastu Compliant"> </i><i class="i24" value="Water Storage"> </i> </div>\n \n <div class="LyrIcon clkEvntStp top0imp"></div>\n </div>\n \t\t\t</div>\n <div class="clr p5"></div>\n <div class="lf f13 hm10 mb5">Dealer : <a data-pid="1122559" class="hverU blkImp srpTplTrck" title="Sri Sakthi Real Estate , Chennai Central" target="_blank" href="/sri-sakthi-real-estate-chennai-central-drid-1122559">Sri Sakthi Real Estate</a>     Posted : Today \n </div> \n \t\t</div>\n <div class="clr"></div>\n <div data-srptrk="ntrck" class="srpAction m10 mt5">\n \t\t<a data-mxid="" data-apid="1122559" data-mc="N" data-rc="R" data-cl="Dealer" data-pgid="QS" href="javascript:void(0);" class="srpBlue f13 mr10 lf cntClk" title="Send E-mail & SMS"> Contact Dealer <i>FREE</i></a><a data-pgid="QS" data-src="listing rank" data-lst="P" data-sms="RGVhciBBRERfQlVZRVJOQU1FX0hFUkUsIHlvdSBtYXkgY29udGFjdCBCYWJ1IGF0ICs5MS05Nzg5MDc0NzQxIGZvciBJTlIgMTggSyAxMDAwIFNxLiBGdC4gRmxhdCBpbiBDaG9vbGFpbWVkdS4=" data-trksrc="listing rank" data-ttc="" href="javascript:void(0);" class="srpWhite f13 mr10 lf vpn" id="viewphnoQ26021619" title="View Phone Number">View Phone Number</a><div data-src="listing rank" id="prop_Q26021619" class="sl_container blkImp f15 lf mt5 mr10"><span class="sl_star_empty_container" title="Shortlist this property"><i class="lf uiIcon sl_star_empty"></i><span class="lf m5">Shortlist</span></span></div>\t <div class="lf mt5 rptLtng" data-cl="A" data-md="R" data-pid="1122559" data-proptype="1" data-photocount="0" data-rescom="R">\n\t\t<div class="row dwnSrp"> \n\t\t<i class="spdpIcn repot_acu"></i> \n \t\t<a class="f13 b delCh blLink">Report problem with listing</a>\n\t </div>\n\t </div>\n </div>\n <div class="abs verifyLbl ViconPosSrp">\n <div id="tooltipSociety" class="infoTip2 fwn f13 ital r5 hide VlyrPosSrp">\n Learn about our verification process <a id="verify_process_info" class="blLink uLine" href="javascript:void(0)" style="text-decoration:underline">here</a>.\n <i class="ver-arrow-down abs" style="left: 80px; bottom: -12px;"></i>\n </div>\n <i class="uiIcon verified mt8"></i>\n </div>\n \t\t<div class="clr pdt10"></div>\n </div> \n\n'
b'<div data-propid="X22163381" data-pgid="QS" class="srpWrap " title="View property details" data-fsl="N">\n\t\t<input id="ajxPDFlg" type="hidden" value="najx">\n <input id="dataSRPCLKTRK" type="hidden" value="ON">\n <i class="uiIcon pLatinum"></i>\t\t<div class="wrapttl">\n\t\t\t<div class="_srpttl srpttl fwn wdthFix480 lf">\n <b class="WebRupee f14 mr5"> ₹</b> <b id="rs_X22163381">22,000</b>\n <a data-proppos="\'\'" id="desc_X22163381" class="b wWrap" target="_blank" title="2 BHK, Residential Apartment for rent in Choolaimedu" href="/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-r2-spid-X22163381" data-fsl="N">2 BHK, Residential Apartment for rent in Choolaimedu</a> </div>\n <i class="uline" data-maplatlngzm="13.0673818,80.2213615,11" data-iwdesc=" Residential Apartment for rent in Choolaimedu" data-ttlurl="http://www.99acres.com/2-bhk-bedroom-apartment-flat-for-rent-in-choolaimedu-chennai-central-1000-sq-ft-r2-spid-X22163381" data-price="22,000, @ <span class=WebRupee>₹ </span>22/ Sq.Ft." data-area="Built-up ,1000,Sq.Ft." data-bedrm="2" data-bldname="On Request" title="View Map"><i class="uiIcon imap"></i><i class="ml_5 f13 vmid hverU">Map</i></i> <div class="clr"></div>\n\t\t</div>\n \n \n\t\t<div class="srpDetail">\n\t\t\t<div class="srpImg rel">\n <img class="imgBoxSrp lazy" alt="2 BHK, Residential Apartment for rent in Choolaimedu" width="208" height="150" data-original="http://static.99acres.com/images/srpimages/noproperty-new.png" src="http://static.99acres.com/images/i0.gif"><div class="imgCap" data-clk-json=\'{"sno":-1,"ids":"0;732;","phType":"PROP","index":0,"text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'><a class="trackVamRos" vamacttype="Locality_Video_Count" vamactsrc="RENT_SRP" data-trkctgry="CLICK_LOCALITY_VIDEO_LINK" data-blid="732" href="#" data-clk-json=\'{"vtag":"LOC","sno":-1,"tab":4,"ids":"0;732;","phType":"PROP","entity":"locimages","subtab":"LVIDEO","text":"Sri Sakthi Real Estate","classLabel":"Dealer","profileId":"1122559","bedroomNum":"2","src":"SRP"}\'>1 Locality Video</a><div class="clr"></div></div>\t\t\t</div>\n\t\t\t<div class="srpDataWrap"><span>Built-up Area : <b>1000 Sq.Ft. </b></span><div class="clr pdt8"></div><span class="doElip">Society : <bclass>On Request</bclass></span><div class="sep clr mt3imp"></div><span><span>Highlights:  </span> <span>On Rent </span><span> <span>/ </span> 1 to 5 years old </span><span> <span>/ </span> Furnished </span><span> <span>/ </span> 1st Floor (out of 4) </span></span><div class="sep clr"></div>\t\t\t\t<div class="lf f12 wBr">\n\t\t\t\t\t<b>Description :</b> \n 2bhk house on rent in choolaimedu , Gill nagar area with all nessesary facilties.\t\t\t\t</div>\n \t\t\t</div>\n <div class="clr p5"></div>\n <div class="lf f13 hm10 mb5">Dealer : <a data-pid="1122559" class="hverU blkImp srpTplTrck" title="Sri Sakthi Real Estate , Chennai Central" target="_blank" href="/sri-sakthi-real-estate-chennai-central-drid-1122559">Sri Sakthi Real Estate</a>     Posted : Today \n </div> \n \t\t</div>\n <div class="clr"></div>\n <div data-srptrk="ntrck" class="srpAction m10 mt5">\n \t\t<a data-mxid="" data-apid="1122559" data-mc="N" data-rc="R" data-cl="Dealer" data-pgid="QS" href="javascript:void(0);" class="srpBlue f13 mr10 lf cntClk" title="Send E-mail & SMS"> Contact Dealer <i>FREE</i></a><a data-pgid="QS" data-src="listing rank" data-lst="P" data-sms="RGVhciBBRERfQlVZRVJOQU1FX0hFUkUsIHlvdSBtYXkgY29udGFjdCBCYWJ1IGF0ICs5MS05Nzg5MDc0NzQxIGZvciBJTlIgMjIgSyAxMDAwIFNxLiBGdC4gRmxhdCBpbiBDaG9vbGFpbWVkdS4=" data-trksrc="listing rank" data-ttc="" href="javascript:void(0);" class="srpWhite f13 mr10 lf vpn" id="viewphnoX22163381" title="View Phone Number">View Phone Number</a><div data-src="listing rank" id="prop_X22163381" class="sl_container blkImp f15 lf mt5 mr10"><span class="sl_star_empty_container" title="Shortlist this property"><i class="lf uiIcon sl_star_empty"></i><span class="lf m5">Shortlist</span></span></div>\t <div class="lf mt5 rptLtng" data-cl="A" data-md="R" data-pid="1122559" data-proptype="1" data-photocount="0" data-rescom="R">\n\t\t<div class="row dwnSrp"> \n\t\t<i class="spdpIcn repot_acu"></i> \n \t\t<a class="f13 b delCh blLink">Report problem with listing</a>\n\t </div>\n\t </div>\n </div>\n <div class="abs verifyLbl ViconPosSrp">\n <div id="tooltipSociety" class="infoTip2 fwn f13 ital r5 hide VlyrPosSrp">\n Learn about our verification process <a id="verify_process_info" class="blLink uLine" href="javascript:void(0)" style="text-decoration:underline">here</a>.\n <i class="ver-arrow-down abs" style="left: 80px; bottom: -12px;"></i>\n </div>\n <i class="uiIcon verified mt8"></i>\n </div>\n \t\t<div class="clr pdt10"></div>\n </div> \n\n'
所以无论你想要什么,都需要从元素中提取出来。