Here是我要为其提取按钮链接文本的链接,但我无法这样做 打开网站后,我从“选择产品”中选择一个选项,假设我选择了第一个选项,即“丙烯酸涂料”,然后出现3种类型,即“底漆”,“中间体”,“完成”, 我想提取他们无法执行的文本。
import requests
from bs4 import BeautifulSoup
driver = webdriver.Chrome('~/chromedriver.exe')
driver.get('http://www.asianpaintsppg.com/applications/protective_products.aspx')
lst_name = ['Acrylic Coatings','Glass Flake Coatings']
for i in lst_name:
print(i)
driver.find_element_by_xpath("//select[@name='txtProduct']/option[text()="+"'"+str(i)+"'"+"]").click()
page = requests.get("http://www.asianpaintsppg.com/applications/protective_products.aspx")
soup = BeautifulSoup(page.content, 'html.parser')
for div in soup.findAll('table', attrs={'id':'dataLstSubCat'}):
print(div.find('a')['href'])
但是我在这里得到空值。 任何帮助将不胜感激。
答案 0 :(得分:2)
有些选项可以不使用硒来获取子类别。尝试使用如下所示的发帖请求。
import requests
from bs4 import BeautifulSoup
url = "http://www.asianpaintsppg.com/applications/protective_products.aspx"
with requests.Session() as s:
r = s.get(url)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
payload['txtProduct'] = '2' #This is the dropdown number
res = s.post(url,data=payload)
sauce = BeautifulSoup(res.text,"lxml")
subcat = [item.text for item in sauce.select("[id^='dataLstSubCat_']")]
print(subcat)
您可能会得到的输出:
['Primers', 'Intermediates', 'Finishes']
答案 1 :(得分:1)
您不希望.text不具有href属性,而且还需要等待条件以允许页面更新:
#dataLstSubCat a
然后在循环|理解中提取.text
items = [item.text for item in soup.select('#dataLstSubCat a')]
您可以用硒来做所有事情-您需要一个等待条件以确保内容存在,并需要一个附加等待条件以使文本在迭代1后发生更改。我使用的time.sleep是次优的。
items = [item.text for item in WebDriverWait(driver,5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#dataLstSubCat a")))]
其他进口:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
您可能可以使用POST请求和一个初始GET来完成全部操作,因为该页面看起来使用了__doPostBack
(.aspx),其中上面下拉列表中的值用于返回子项。 / p>
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
import time
driver = webdriver.Chrome() #'~/chromedriver.exe')
driver.get('http://www.asianpaintsppg.com/applications/protective_products.aspx')
lst_name = ['Acrylic Coatings','Glass Flake Coatings']
for i in lst_name:
driver.find_element_by_xpath("//select[@name='txtProduct']/option[text()="+"'"+str(i)+"'"+"]").click()
items = [item.text for item in WebDriverWait(driver,5).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#dataLstSubCat a")))]
print(items)
time.sleep(2)
答案 2 :(得分:0)
使用以下代码。它为我提供以下输出。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
driver = webdriver.Chrome('~/chromedriver.exe')
driver.get('http://www.asianpaintsppg.com/applications/protective_products.aspx')
lst_name = ['Acrylic Coatings','Glass Flake Coatings']
for i in lst_name:
driver.find_element_by_xpath("//select[@name='txtProduct']/option[text()="+"'"+str(i)+"'"+"]").click()
elements=WebDriverWait(driver, 10).until(expected_conditions.presence_of_all_elements_located((By.XPATH, '//table[@id="dataLstSubCat"]//tr//td//a[starts-with(@id,"dataLstSubCat_LnkBtnSubCat_")]')))
for ele in elements:
print(ele.text)