Question

我正在使用Selenium从应用商店中抓取内容：https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830

我试图提取文本字段“作为主题专家，我们的团队非常有魅力...”

我试图按类查找元素

review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings

但是它返回一个空列表[]

代码有什么问题吗？还是更好的解决方案？感谢您的帮助。

Answer 1

使用requests和BeautifulSoup：

import requests
from bs4 import BeautifulSoup

url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'

res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)

输出：

As subject matter experts, our team is very engaging and focused on our near and long term financial health!

Answer 2

您可以使用WebDriverWait等待元素的可见性并获取文本。请检查good selenium locator。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

#...

wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
    starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
    title = review_rating.find_element_by_css_selector("h3").text
    review = review_rating.find_element_by_css_selector("p").text

Answer 3

我可以建议将selenium与BeautifulSoup混合使用吗？使用网络驱动程序：

from bs4 import BeautifulSoup
from selenium import webdriver
browser=webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")

bs = BeautifulSoup(innerHTML, 'html.parser')

bs.blockquote.p.text

输出：

Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'

如果有什么要解释的，那就告诉我！

Answer 4

使用ourPk len is 91 EC key length : 65并等待WebDriverWait并使用以下CSS选择器。

presence_of_all_elements_located

输出：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
 review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)

如何在Python中使用Selenium提取文本元素？

4 个答案:

输出：