如何在Python中使用Selenium提取文本元素?

时间:2019-07-25 19:37:25

标签: python selenium class xpath

enter image description here

我正在使用Selenium从应用商店中抓取内容:https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830

我试图提取文本字段“作为主题专家,我们的团队非常有魅力...”

我试图按类查找元素

review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings

但是它返回一个空列表[]

代码有什么问题吗?还是更好的解决方案?感谢您的帮助。

4 个答案:

答案 0 :(得分:3)

使用requestsBeautifulSoup

import requests
from bs4 import BeautifulSoup

url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)

输出:

As subject matter experts, our team is very engaging and focused on our near and long term financial health!

答案 1 :(得分:2)

您可以使用WebDriverWait等待元素的可见性并获取文本。请检查good selenium locator

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...

wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
    starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
    title = review_rating.find_element_by_css_selector("h3").text
    review = review_rating.find_element_by_css_selector("p").text

答案 2 :(得分:2)

我可以建议将seleniumBeautifulSoup混合使用吗? 使用网络驱动程序:

from bs4 import BeautifulSoup
from selenium import webdriver
browser=webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")

bs = BeautifulSoup(innerHTML, 'html.parser')

bs.blockquote.p.text

输出:

Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'

如果有什么要解释的,那就告诉我!

答案 3 :(得分:2)

使用ourPk len is 91 EC key length : 65 并等待WebDriverWait并使用以下CSS选择器。

presence_of_all_elements_located

输出:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
 review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)