参考此post,我从@DebanjanB获得了解决方案,但是我无法对我的所有 PRODUCT TYPE 使用该解决方案,它似乎有效仅适用于Acrylics
和Coal Tar
。如何将其用于所有产品类型
这是解决方案
1) print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])
但是当我用于
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])
这不起作用
关于它如何工作的任何建议。
谢谢
答案 0 :(得分:2)
我尝试使用以下代码,它会返回您想要的产品类型。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver=webdriver.Chrome()
driver.get("http://www.carboline.com/products/")
driver.maximize_window()
driver.find_element_by_css_selector('a.close-privacy-cookie.acceptButton').click()
element=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"h5#Typeh5 span")))
element.click()
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//div[@aria-labelledby='Typeh5']//ul[@id='Type']//li//label[contains(.,'Alkyds')]"))).click()
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.XPATH, "//ul[@id='productList']//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])
['Carbocoat 115', 'Carbocoat 115 VOC', 'Carbocoat 116', 'Carbocoat 140', 'Carbocoat 150 Universal Primer', 'Carbocoat 153', 'Carbocoat 2600', 'Carbocoat 2900', 'Carbocoat 2901', 'Carbocoat 30', 'Carbocoat 45 Industrial Enamel', 'Carbocoat 56', 'Carbocoat 70', 'Carbocoat 8215', 'Carbocoat 8215 Non-Skid', 'Carbocoat 8215 VOC', 'Carbocoat 8216 Non-Skid', 'Carbocoat 8225', 'Carbocoat 8229 Non-Lift Primer', 'Carbocoat 8239', 'Carbocoat 8245', 'Carbocoat 8259 WR', 'Carbocoat 8287 WR', 'Carbocoat OEM Universal Primer']
答案 1 :(得分:1)
这能满足您的需求吗?
import pandas as pd
from bs4 import Beautifulsoup
import requests
response = requests.get('http://www.carboline.com/products/')
soup = BeautifulSoup(response.text, 'html.parser')
products = soup.find('ul', {'id':'productList'})
lists = products.find_all('li',{'class':'topLevel'})
results = pd.DataFrame()
for each in lists:
a = each.find('a')
text = a.text
href = a['href']
results = results.append(pd.DataFrame([[text, href]], columns = ['product_type', 'href'])).reset_index(drop=True)
输出:
print(results)
product_type href
0 A/D Firefilm III /products/product-details/?prod=35AD
1 A/D Firefilm III C /products/product-details/?prod=48AD
2 A/D TC-55 SEALER /products/product-details/?prod=30AD
3 Accelerator A-20 /products/product-details/?prod=50AD
4 Acrilast Caulk /products/product-details/?prod=0177
5 Add-2 Mildewcide Additive /products/product-details/?prod=0658
6 Additive 101 /products/product-details/?prod=P262
7 Additive 47 /products/product-details/?prod=0547
8 Additive 8504 /products/product-details/?prod=8504
9 Additive 8505 /products/product-details/?prod=8505
10 Additive 8506 /products/product-details/?prod=8506
11 Additive 8509 /products/product-details/?prod=8509
12 Bitumastic 300 LH /products/product-details/?prod=0168
13 Bitumastic 300 M /products/product-details/?prod=0165&global=true
14 Bitumastic 300 M COE /products/product-details/?prod=0391
15 Bitumastic 50 /products/product-details/?prod=0025
16 Carbocoat 115 /products/product-details/?prod=0801
17 Carbocoat 115 VOC /products/product-details/?prod=206F
18 Carbocoat 116 /products/product-details/?prod=0295
19 Carbocoat 140 /products/product-details/?prod=228F
20 Carbocoat 150 Universal Primer /products/product-details/?prod=0808&global=true
21 Carbocoat 153 /products/product-details/?prod=0632
22 Carbocoat 2600 /products/product-details/?prod=0005
23 Carbocoat 2900 /products/product-details/?prod=0010
24 Carbocoat 2901 /products/product-details/?prod=0012
25 Carbocoat 30 /products/product-details/?prod=P483
26 Carbocoat 45 Industrial Enamel /products/product-details/?prod=0171
27 Carbocoat 56 /products/product-details/?prod=DM56
28 Carbocoat 70 /products/product-details/?prod=1519
29 Carbocoat 8215 /products/product-details/?prod=8215
.. ... ...
470 Thinner 2 /products/product-details/?prod=0522
471 Thinner 21 /products/product-details/?prod=0521
472 Thinner 213 /products/product-details/?prod=0555
473 Thinner 214 /products/product-details/?prod=0556
474 Thinner 215 /products/product-details/?prod=0557
475 Thinner 221 /products/product-details/?prod=0546
476 Thinner 224 /products/product-details/?prod=0574
477 Thinner 225 E /products/product-details/?prod=0591
478 Thinner 228 /products/product-details/?prod=0570
479 Thinner 230 /products/product-details/?prod=0551
480 Thinner 231 /products/product-details/?prod=0516
481 Thinner 234 /products/product-details/?prod=0562
482 Thinner 235 /products/product-details/?prod=0563
483 Thinner 236 E /products/product-details/?prod=0564
484 Thinner 238 /products/product-details/?prod=0566
485 Thinner 241 /products/product-details/?prod=0374
486 Thinner 242 E /products/product-details/?prod=T242
487 Thinner 243 E /products/product-details/?prod=T243
488 Thinner 246 /products/product-details/?prod=T246
489 Thinner 248 /products/product-details/?prod=215F
490 Thinner 25 /products/product-details/?prod=0525
491 Thinner 254 /products/product-details/?prod=0631
492 Thinner 26 /products/product-details/?prod=0526
493 Thinner 33 /products/product-details/?prod=0533
494 Thinner 38 /products/product-details/?prod=TH39
495 Thinner 45 /products/product-details/?prod=0545
496 Thinner 72 /products/product-details/?prod=0572
497 Thinner 76 /products/product-details/?prod=0576
498 Zinc Filler Type II /products/product-details/?prod=0229
499 Zinc Filler Type III /products/product-details/?prod=0224
[500 rows x 2 columns]
答案 2 :(得分:1)
我将缩短如下内容,在href
属性值上以操作符子字符串匹配开头
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
r = requests.get('http://www.carboline.com/products/')
soup = bs(r.content, 'lxml')
df = pd.DataFrame([(item.text, 'http://www.carboline.com' + item['href']) for item in soup.select('[href^="/products/product-details/?prod="]')], columns = ['product', 'link'])
print(df)