使用python中的selinium模块对基于PHP的网站进行Web抓取

时间:2016-06-20 10:31:46

标签: python-3.x web-scraping

我在python中使用selinium模块来抓取基于PHP的网站。我无法右键单击并检查元素。但是,如果我单击打开菜单,开发人员和页面源我可以看到网站的页面源。我有以下HTML代码

<datalist id="observatories" style="font: Verdana; font-size:10px; width:142px; text-transform:uppercase; background-color: #eeffee;">
      <option value='ADIRAMAPATINAM'>ADIRAMAPATINAM</option><option value='AGARTALA'>AGARTALA</option><option value='AGATTI'>AGATTI</option><option value='AGRA (IAF)'>AGRA (IAF)</option><option value='AHMEDABAD'>AHMEDABAD</option><option value='AJMER'>AJMER</option><option value='AKOLA (A)'>AKOLA (A)</option><option value='ALAPUZHA (ALLEPPEY)'>ALAPUZHA (ALLEPPEY)</option><option value='ALLAHABAD'>ALLAHABAD</option><option value='ALMORA'>ALMORA</option><option value='AMBALA'>AMBALA</option><option value='AMBIKAPUR'>AMBIKAPUR</option><option value='AMINI DIVI'>AMINI DIVI</option><option value='AMRITSAR'>AMRITSAR</option><option value='ANANDPUR SAHIB'>ANANDPUR SAHIB</option><option value='ANANTAPUR'>ANANTAPUR</option><option value='ANNI'>ANNI</option><option value='AURANGABAD (BIHAR)'>AURANGABAD (BIHAR)</option><option value='AURANGABAD (CHIKALTHANA)'>AURANGABAD (CHIKALTHANA)</option><option value='BAHRAICH'>BAHRAICH</option><option value='BALASORE'>BALASORE</option><option value='BANIHAL'>BANIHAL</option><option value='BANKURA'>BANKURA</option><option value='BAPATLA'>BAPATLA</option><option value='BAREILLY'>BAREILLY</option><option value='BARMER'>BARMER</option><option value='BARPETA'>BARPETA</option><option value='BATOTE'>BATOTE</option><option value='BELGAUM (SAMBRE) (A)'>BELGAUM (SAMBRE) (A)</option><option value='BENGALURU'>BENGALURU</option><option value='BHADERWAH'>BHADERWAH</option><option value='BHAGALPUR'>BHAGALPUR</option><option value='BHAVNAGAR (A)'>BHAVNAGAR (A)</option><option value='BHAWANI PATNA'>BHAWANI PATNA</option><option value='BHOPAL'>BHOPAL</option><option value='BHUBANESHWAR'>BHUBANESHWAR</option><option value='BHUJ'>BHUJ</option><option value='BHUNTAR'>BHUNTAR</option><option value='BIKANER'>BIKANER</option><option value='BILASPUR (HP)'>BILASPUR (HP)</option><option value='BOKARO'>BOKARO</option><option value='CANNANORE (KANNUR)'>CANNANORE (KANNUR)</option><option value='CANNING'>CANNING</option><option value='CHAMBA (HP)'>CHAMBA (HP)</option><option value='CHAMPAWAT'>CHAMPAWAT</option><option value='CHAMPHAI'>CHAMPHAI</option><option value='CHANDBALI'>CHANDBALI</option><option value='CHANDIGARH'>CHANDIGARH</option><option value='CHENNAI (MINAMBAKKAM) (A)'>CHENNAI (MINAMBAKKAM) (A)</option><option value='CHENNAI (NUNGAMBAKKAM)'>CHENNAI (NUNGAMBAKKAM)</option><option value='CHERRAPUNJI'>CHERRAPUNJI</option><option value='CHITRADURGA'>CHITRADURGA</option><option value='CHURU'>CHURU</option><option value='COIMBATORE (PEELAMEDU) (A)'>COIMBATORE (PEELAMEDU) (A)</option><option value='COOCH BEHAR'>COOCH BEHAR</option><option value='CUDDALORE'>CUDDALORE</option><option value='DAHANU'>DAHANU</option><option value='DALTONGANJ'>DALTONGANJ</option><option value='DEESA'>DEESA</option><option value='DEHRA DUN'>DEHRA DUN</option><option value='DELHI UNIVERSITY'>DELHI UNIVERSITY</option><option value='DEOGARH'>DEOGARH</option><option value='DHUBRI'>DHUBRI</option><option value='DIAMOND HARBOUR'>DIAMOND HARBOUR</option><option value='DIBRUGARH'>DIBRUGARH</option><option value='DIGHA'>DIGHA</option><option value='DIMAPUR'>DIMAPUR</option><option value='DWARKA'>DWARKA</option><option value='FARIDABAD'>FARIDABAD</option><option value='FURSATGANJ'>FURSATGANJ</option><option value='GADAG'>GADAG</option><option value='GANGTOK'>GANGTOK</option><option value='GANNAVARAM'>GANNAVARAM</option><option value='GAYA'>GAYA</option><option value='GIRIDIH'>GIRIDIH</option><option value='GOPALPUR'>GOPALPUR</option><option value='GORAKHPUR'>GORAKHPUR</option><option value='GULBARGA'>GULBARGA</option><option value='GUNA'>GUNA</option><option value='GUWAHATI'>GUWAHATI</option><option value='GWALIOR'>GWALIOR</option><option value='GYALSINGH'>GYALSINGH</option><option value='HALDIA'>HALDIA</option><option value='HAMIRPUR (HP)'>HAMIRPUR (HP)</option><option value='HANUMANGARH'>HANUMANGARH</option><option value='HARIDWAR'>HARIDWAR</option><option value='HARNAI'>HARNAI</option><option value='HIRAKUD'>HIRAKUD</option><option value='HISSAR'>HISSAR</option><option value='HONAVAR'>HONAVAR</option><option value='HOSHANGABAD'>HOSHANGABAD</option><option value='HYDERABAD'>HYDERABAD</option><option value='IMPHAL (TULIHAL)'>IMPHAL (TULIHAL)</option><option value='INDORE'>INDORE</option><option value='ITANAGAR'>ITANAGAR</option><option value='JABALPUR'>JABALPUR</option><option value='JAGDALPUR'>JAGDALPUR</option><option value='JAIPUR'>JAIPUR</option><option value='JAISALMER'>JAISALMER</option><option value='JALPAIGURI'>JALPAIGURI</option><option value='JAMMU'>JAMMU</option><option value='JAMSHEDPUR (A)'>JAMSHEDPUR (A)</option><option value='JANJGIR'>JANJGIR</option><option value='JHANSI'>JHANSI</option><option value='JHARSUGUDA'>JHARSUGUDA</option><option value='JODHPUR'>JODHPUR</option><option value='JORHAT'>JORHAT</option><option value='KAILASHAHAR'>KAILASHAHAR</option><option value='KAKINADA'>KAKINADA</option><option value='KALINGAPATANAM'>KALINGAPATANAM</option><option value='KANYAKUMARI'>KANYAKUMARI</option><option value='KARAIKAL'>KARAIKAL</option><option value='KARIPUR'>KARIPUR</option><option value='KARNAL'>KARNAL</option><option value='KARWAR'>KARWAR</option><option value='KATRA'>KATRA</option><option value='KAVALI'>KAVALI</option><option value='KEONJHARGARH'>KEONJHARGARH</option><option value='KHAJURAHO'>KHAJURAHO</option><option value='KOCHI (CIAL)'>KOCHI (CIAL)</option><option value='KODAIKANAL'>KODAIKANAL</option><option value='KOLHAPUR'>KOLHAPUR</option><option value='KOLKATA'>KOLKATA</option><option value='KOLKATA (DUM DUM) (A)'>KOLKATA (DUM DUM) (A)</option><option value='KOTA (A)'>KOTA (A)</option><option value='KOZHIKODE (CALICUT)'>KOZHIKODE (CALICUT)</option><option value='KUKERNAG'>KUKERNAG</option><option value='KUPWARA'>KUPWARA</option><option value='KURNOOL'>KURNOOL</option><option value='KURUKSHETRA'>KURUKSHETRA</option><option value='LENGPUI'>LENGPUI</option><option value='LUCKNOW'>LUCKNOW</option><option value='LUDHIANA'>LUDHIANA</option><option value='MACHILIPATNAM'>MACHILIPATNAM</option><option value='MADHUBANI'>MADHUBANI</option><option value='MADIKERI'>MADIKERI</option><option value='MADURAI (A)'>MADURAI (A)</option><option value='MAHABALESHWAR'>MAHABALESHWAR</option><option value='MALDA'>MALDA</option><option value='MANGALORE (BAJPE)'>MANGALORE (BAJPE)</option><option value='MANGALORE (PANAMBUR)'>MANGALORE (PANAMBUR)</option><option value='MANGAN'>MANGAN</option><option value='MEERUT'>MEERUT</option><option value='MINICOY'>MINICOY</option><option value='MOKOKCHUNG'>MOKOKCHUNG</option><option value='MORMUGAO'>MORMUGAO</option><option value='MUKTESWAR (KUMAUN)'>MUKTESWAR (KUMAUN)</option><option value='MUMBAI (SANTACRUZ)'>MUMBAI (SANTACRUZ)</option><option value='MUZAFFARNAGAR'>MUZAFFARNAGAR</option><option value='NAGAPATTINAM'>NAGAPATTINAM</option><option value='NAGPUR'>NAGPUR</option><option value='NALANDA'>NALANDA</option><option value='NALIYA'>NALIYA</option><option value='NAMCHI'>NAMCHI</option><option value='NARELA'>NARELA</option><option value='NARSAPUR'>NARSAPUR</option><option value='NASHIK'>NASHIK</option><option value='NELLORE'>NELLORE</option><option value='NEW DELHI'>NEW DELHI</option><option value='NEW DELHI (AYANAGAR)'>NEW DELHI (AYANAGAR)</option><option value='NEW DELHI (PALAM)'>NEW DELHI (PALAM)</option><option value='NIZAMABAD'>NIZAMABAD</option><option value='NORTH LAKHIMPUR'>NORTH LAKHIMPUR</option><option value='OKHA'>OKHA</option><option value='ONGOLE'>ONGOLE</option><option value='PAHALGAM'>PAHALGAM</option><option value='PALI'>PALI</option><option value='PAMBAN'>PAMBAN</option><option value='PANJIM'>PANJIM</option><option value='PANTNAGAR'>PANTNAGAR</option><option value='PARADIP PORT'>PARADIP PORT</option><option value='PARBHANI'>PARBHANI</option><option value='PASIGHAT'>PASIGHAT</option><option value='PATIALA'>PATIALA</option><option value='PATNA'>PATNA</option><option value='PENDRA ROAD'>PENDRA ROAD</option><option value='PITHORAGARH'>PITHORAGARH</option><option value='PONDICHERRY'>PONDICHERRY</option><option value='PORBANDAR'>PORBANDAR</option><option value='PORT BLAIR'>PORT BLAIR</option><option value='PUNE'>PUNE</option><option value='PURI'>PURI</option><option value='PURNEA'>PURNEA</option><option value='QUAZIGUND'>QUAZIGUND</option><option value='RAIPUR'>RAIPUR</option><option value='RAJGIR'>RAJGIR</option><option value='RAJKOT'>RAJKOT</option><option value='RAMGUNDAM'>RAMGUNDAM</option><option value='RANCHI'>RANCHI</option><option value='RATNAGIRI'>RATNAGIRI</option><option value='RIDGE'>RIDGE</option><option value='SAGAR'>SAGAR</option><option value='SALEM'>SALEM</option><option value='SAMBALPUR'>SAMBALPUR</option><option value='SANGLI'>SANGLI</option><option value='SATARA'>SATARA</option><option value='SATNA'>SATNA</option><option value='SHAHAJAHANPUR'>SHAHAJAHANPUR</option><option value='SHILLONG'>SHILLONG</option><option value='SHIMLA'>SHIMLA</option><option value='SILCHAR'>SILCHAR</option><option value='SIRSA'>SIRSA</option><option value='SOLAPUR'>SOLAPUR</option><option value='SRI NIKETAN'>SRI NIKETAN</option><option value='SRIGANGANAGAR'>SRIGANGANAGAR</option><option value='SRINAGAR'>SRINAGAR</option><option value='SULTANPUR'>SULTANPUR</option><option value='SUNDERNAGAR'>SUNDERNAGAR</option><option value='SURAT'>SURAT</option><option value='TADONG'>TADONG</option><option value='TAWANG'>TAWANG</option><option value='TEHRI'>TEHRI</option><option value='TEZPUR'>TEZPUR</option><option value='THANE'>THANE</option><option value='THIRUVANANTHAPURAM'>THIRUVANANTHAPURAM</option><option value='THIRUVANANTHAPURAM (A)'>THIRUVANANTHAPURAM (A)</option><option value='TIRUCHIRAPALLI (A)'>TIRUCHIRAPALLI (A)</option><option value='TIRUPATHI'>TIRUPATHI</option><option value='TONDI'>TONDI</option><option value='TUNI'>TUNI</option><option value='UDAIPUR'>UDAIPUR</option><option value='UDAIPUR (DABOK)'>UDAIPUR (DABOK)</option><option value='VADODARA (A)'>VADODARA (A)</option><option value='VARANASI'>VARANASI</option><option value='VARANASI (BABATPUR)'>VARANASI (BABATPUR)</option><option value='VELLORE'>VELLORE</option><option value='VERAVAL'>VERAVAL</option><option value='VISAKHAPATNAM'>VISAKHAPATNAM</option><option value='VISHAKHAPATNAM'>VISHAKHAPATNAM</option></datalist>
Search&nbsp;&nbsp;&nbsp;: &nbsp;&nbsp;&nbsp;
<input name="obs_name" type="text" list="observatories" placeholder="City Name" style="font: Verdana; font-size:12px; width:250px; text-transform:uppercase; background-color: #eeffee; padding-left:3px;">
<input name="submit" type="submit" value="Go" style="font:Verdana; font-size:12px;" >

相应的文本框和按钮如下所示

enter image description here

我的python代码是

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

browser = webdriver.Firefox()
url = 'http://www.imd.gov.in/pages/city_weather_main.php'
browser.get(url)
browser.find_element_by_xpath('//*[@id="observatories"]/option[contains(text(), "PUNE")]').click()

我想在文本框中填写PUNE,然后点击go。

我该怎么做?

1 个答案:

答案 0 :(得分:0)

您需要找到Search文字输入元素,然后找到send keys()

browser.find_element_by_id('search').send_keys('PUNE')