使用Python

时间:2017-01-31 10:35:36

标签: python python-3.x selenium drop-down-menu web-scraping

我是Python的新手并试图检索其中的数据 使用Python 3.6.0版本this Site

有2个下拉列表和第二个数据取决于第一个选择。

首先:' Organizasyon Adi' 第二名:' UEVCB Adi'

来源的所有选项都是:

<option value="0" selected="selected">TÜMÜ</option> #this is default value when we open the page
<option value="10374">1461 TRABZON ELEKTRİK ÜRETİM A.Ş</option>
<option value="9426">2M ELEKTRİK ÜRETİM SANAYİ VE TİCARET ANONİM ŞİRKETİ</option>

这些是冷杉下拉的选项,有近800种选择。

除非单击第二个Dropdown框,否则我们无法在不检查页面的情况下看到第二个Dropdown选项。 (点击后,这两个下拉菜单都会打开一个搜索框。)

第二个下拉列表会打开所选组织的单位列表。

当选择两个下拉列表中的选项时,它会生成一个表格数据,我们会尝试获取所有单位的数据。

我无法使用一个程序废弃所有单位的数据,所以我决定单独废弃它们。

使用此代码:

 
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
import time
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
time.sleep(3)
organisation = driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_label']")
organisation.click()
dropdown1 =  driver.find_element_by_xpath(".//*[@id='j_idt102:distributionId_filter']")
dropdown1.send_keys('1461')
dropdown1.send_keys(u'\ue007')
unit = driver.find_element_by_id('j_idt102:uevcb_label')
dropdown2 = driver.find_element_by_xpath(".//*[@id='j_idt102:uevcb_filter']")
dropdown2.send_keys('SAMA')
dropdown2.send_keys(u'\ue007')
apply= driver.find_element_by_xpath("//*[@id='j_idt102:goster']")
apply.click()
time.sleep(5)

soup = BeautifulSoup(driver.page_source)

table = soup.find_all('table')[0]
rows = table.find_all('tr')[1:]

data = {
    '01.Date' : [],
    '02.Hour' : [],
    '03.NaturalGas' : [],
    '04.Wind' : [],
    '05.Lignite' : [],
    '06.Hard_Coal' : [],
    '07.ImportedCoal' : [],
    '08.Geothermal' : [],
    '09.Hydro_Dam' : [],
    '10.Naphta' : [],
    '11.Biomass' : [],
    '12.River' : [],
    '13.Other' : []
}

for row in rows:
    cols = row.find_all('td')
    data['01.Date'].append( cols[0].get_text() )
    data['02.Hour'].append( cols[1].get_text() )
    data['03.NaturalGas'].append( cols[3].get_text() )
    data['04.Wind'].append( cols[4].get_text() )
    data['05.Lignite'].append( cols[5].get_text() )
    data['06.Hard_Coal'].append( cols[6].get_text() )
    data['07.ImportedCoal'].append( cols[7].get_text() )
    data['08.Geothermal'].append( cols[8].get_text() )
    data['09.Hydro_Dam'].append( cols[9].get_text() )
    data['10.Naphta'].append( cols[10].get_text() )
    data['11.Biomass'].append( cols[11].get_text() )
    data['12.River'].append( cols[12].get_text() )
    data['13.Other'].append( cols[13].get_text() )

df = pd.DataFrame( data )
writer = pd.ExcelWriter('//192.168.0.102/Data/kgup.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
time.sleep(5)
driver.close()

通过此代码,我们可以使用搜索功能和Enter键从第一个下拉列表中进行选择。

说到第二个,它会生成ImportError: sys.meta_path is None, Python is likely shutting down

我该如何处理?

感谢。

1 个答案:

答案 0 :(得分:0)

您的代码似乎对StaleElementException以及异常Element is not clickable at point...都很敏感。尝试下面的网络抓取部分代码,让我知道结果:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from bs4 import BeautifulSoup
import pandas as pd 

url = 'https://seffaflik.epias.com.tr/transparency/uretim/planlama/kgup.xhtml' #
driver = webdriver.Chrome()
driver.get(url)
wait = WebDriverWait(driver, 20)
driver.maximize_window()

wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:distributionId_label'))).click() # organization drop-down
wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:distributionId_filter'))).send_keys('1461' + u'\ue007') # select required
wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt179_modal'))) # wait until modal disappeared
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:uevcb_label'))).click() # unit drop-down
wait.until(EC.element_to_be_clickable((By.ID, 'j_idt102:uevcb_filter'))).send_keys('SAMA' + u'\ue007') # select unit
wait.until(EC.element_to_be_clickable((By.ID,'j_idt102:goster'))).click() # click Apply
wait.until_not(EC.visibility_of_element_located((By.ID,'j_idt15'))) # wait until modal disappeared

soup = BeautifulSoup(driver.page_source)
....