无法提取所有href文本Python硒

时间:2019-06-04 10:05:00

标签: python selenium web-scraping

参考此post,我从@DebanjanB获得了解决方案,但是我无法对我的所有 PRODUCT TYPE 使用该解决方案,它似乎有效仅适用于AcrylicsCoal Tar。如何将其用于所有产品类型

这是解决方案

1) print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Acrylics']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

但是当我用于

print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

这不起作用

关于它如何工作的任何建议。

谢谢

3 个答案:

答案 0 :(得分:2)

我尝试使用以下代码,它会返回您想要的产品类型。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver=webdriver.Chrome()
driver.get("http://www.carboline.com/products/")
driver.maximize_window()
driver.find_element_by_css_selector('a.close-privacy-cookie.acceptButton').click()
element=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"h5#Typeh5 span")))
element.click()
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,"//div[@aria-labelledby='Typeh5']//ul[@id='Type']//li//label[contains(.,'Alkyds')]"))).click()
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.presence_of_all_elements_located((By.XPATH, "//ul[@id='productList']//li[@class='topLevel' and @data-types='Alkyds']//h5[@class]/a[starts-with(@href, '/products/product-details/?prod=')]")))])

输出:

['Carbocoat 115', 'Carbocoat 115 VOC', 'Carbocoat 116', 'Carbocoat 140', 'Carbocoat 150 Universal Primer', 'Carbocoat 153', 'Carbocoat 2600', 'Carbocoat 2900', 'Carbocoat 2901', 'Carbocoat 30', 'Carbocoat 45 Industrial Enamel', 'Carbocoat 56', 'Carbocoat 70', 'Carbocoat 8215', 'Carbocoat 8215 Non-Skid', 'Carbocoat 8215 VOC', 'Carbocoat 8216 Non-Skid', 'Carbocoat 8225', 'Carbocoat 8229 Non-Lift Primer', 'Carbocoat 8239', 'Carbocoat 8245', 'Carbocoat 8259 WR', 'Carbocoat 8287 WR', 'Carbocoat OEM Universal Primer']

答案 1 :(得分:1)

这能满足您的需求吗?

import pandas as pd
from bs4 import Beautifulsoup
import requests

response = requests.get('http://www.carboline.com/products/')
soup = BeautifulSoup(response.text, 'html.parser')

products = soup.find('ul', {'id':'productList'})
lists = products.find_all('li',{'class':'topLevel'})

results = pd.DataFrame()
for each in lists:
    a = each.find('a')
    text = a.text
    href = a['href']
    results = results.append(pd.DataFrame([[text, href]], columns = ['product_type', 'href'])).reset_index(drop=True)

输出:

print(results)
                       product_type                                              href
0                  A/D Firefilm III              /products/product-details/?prod=35AD
1                A/D Firefilm III C              /products/product-details/?prod=48AD
2                  A/D TC-55 SEALER              /products/product-details/?prod=30AD
3                  Accelerator A-20              /products/product-details/?prod=50AD
4                    Acrilast Caulk              /products/product-details/?prod=0177
5         Add-2 Mildewcide Additive              /products/product-details/?prod=0658
6                      Additive 101              /products/product-details/?prod=P262
7                       Additive 47              /products/product-details/?prod=0547
8                     Additive 8504              /products/product-details/?prod=8504
9                     Additive 8505              /products/product-details/?prod=8505
10                    Additive 8506              /products/product-details/?prod=8506
11                    Additive 8509              /products/product-details/?prod=8509
12                Bitumastic 300 LH              /products/product-details/?prod=0168
13                 Bitumastic 300 M  /products/product-details/?prod=0165&global=true
14             Bitumastic 300 M COE              /products/product-details/?prod=0391
15                    Bitumastic 50              /products/product-details/?prod=0025
16                    Carbocoat 115              /products/product-details/?prod=0801
17                Carbocoat 115 VOC              /products/product-details/?prod=206F
18                    Carbocoat 116              /products/product-details/?prod=0295
19                    Carbocoat 140              /products/product-details/?prod=228F
20   Carbocoat 150 Universal Primer  /products/product-details/?prod=0808&global=true
21                    Carbocoat 153              /products/product-details/?prod=0632
22                   Carbocoat 2600              /products/product-details/?prod=0005
23                   Carbocoat 2900              /products/product-details/?prod=0010
24                   Carbocoat 2901              /products/product-details/?prod=0012
25                     Carbocoat 30              /products/product-details/?prod=P483
26   Carbocoat 45 Industrial Enamel              /products/product-details/?prod=0171
27                     Carbocoat 56              /products/product-details/?prod=DM56
28                     Carbocoat 70              /products/product-details/?prod=1519
29                   Carbocoat 8215              /products/product-details/?prod=8215
..                              ...                                               ...
470                       Thinner 2              /products/product-details/?prod=0522
471                      Thinner 21              /products/product-details/?prod=0521
472                     Thinner 213              /products/product-details/?prod=0555
473                     Thinner 214              /products/product-details/?prod=0556
474                     Thinner 215              /products/product-details/?prod=0557
475                     Thinner 221              /products/product-details/?prod=0546
476                     Thinner 224              /products/product-details/?prod=0574
477                   Thinner 225 E              /products/product-details/?prod=0591
478                     Thinner 228              /products/product-details/?prod=0570
479                     Thinner 230              /products/product-details/?prod=0551
480                     Thinner 231              /products/product-details/?prod=0516
481                     Thinner 234              /products/product-details/?prod=0562
482                     Thinner 235              /products/product-details/?prod=0563
483                   Thinner 236 E              /products/product-details/?prod=0564
484                     Thinner 238              /products/product-details/?prod=0566
485                     Thinner 241              /products/product-details/?prod=0374
486                   Thinner 242 E              /products/product-details/?prod=T242
487                   Thinner 243 E              /products/product-details/?prod=T243
488                     Thinner 246              /products/product-details/?prod=T246
489                     Thinner 248              /products/product-details/?prod=215F
490                      Thinner 25              /products/product-details/?prod=0525
491                     Thinner 254              /products/product-details/?prod=0631
492                      Thinner 26              /products/product-details/?prod=0526
493                      Thinner 33              /products/product-details/?prod=0533
494                      Thinner 38              /products/product-details/?prod=TH39
495                      Thinner 45              /products/product-details/?prod=0545
496                      Thinner 72              /products/product-details/?prod=0572
497                      Thinner 76              /products/product-details/?prod=0576
498             Zinc Filler Type II              /products/product-details/?prod=0229
499            Zinc Filler Type III              /products/product-details/?prod=0224

[500 rows x 2 columns]

答案 2 :(得分:1)

我将缩短如下内容,在href属性值上以操作符子字符串匹配开头

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd

r = requests.get('http://www.carboline.com/products/')
soup = bs(r.content, 'lxml')
df = pd.DataFrame([(item.text, 'http://www.carboline.com' + item['href']) for item in soup.select('[href^="/products/product-details/?prod="]')], columns = ['product', 'link'])
print(df)