我正在使用Selenium从应用商店中抓取内容:https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830
我试图提取文本字段“作为主题专家,我们的团队非常有魅力...”
我试图按类查找元素
review_ratings = driver.find_elements_by_class_name('we-truncate we-truncate--multi-line we-truncate--interactive ember-view we-customer-review__body')
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
review_ratings
但是它返回一个空列表[]
代码有什么问题吗?还是更好的解决方案?感谢您的帮助。
答案 0 :(得分:3)
使用requests
和BeautifulSoup
:
import requests
from bs4 import BeautifulSoup
url = 'https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830'
res = requests.get(url)
soup = BeautifulSoup(res.text,'lxml')
item = soup.select_one("blockquote > p").text
print(item)
输出:
As subject matter experts, our team is very engaging and focused on our near and long term financial health!
答案 1 :(得分:2)
您可以使用WebDriverWait
等待元素的可见性并获取文本。请检查good selenium locator。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#...
wait = WebDriverWait(driver, 5)
review_ratings = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".we-customer-review")))
for review_rating in review_ratings:
starts = review_rating.find_element_by_css_selector(".we-star-rating").get_attribute("aria-label")
title = review_rating.find_element_by_css_selector("h3").text
review = review_rating.find_element_by_css_selector("p").text
答案 2 :(得分:2)
我可以建议将selenium
与BeautifulSoup
混合使用吗?
使用网络驱动程序:
from bs4 import BeautifulSoup
from selenium import webdriver
browser=webdriver.Chrome()
url = "https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830"
browser.get(url)
innerHTML = browser.execute_script("return document.body.innerHTML")
bs = BeautifulSoup(innerHTML, 'html.parser')
bs.blockquote.p.text
输出:
Out[22]: 'As subject matter experts, our team is very engaging and focused on our near and long term financial health!'
如果有什么要解释的,那就告诉我!
答案 3 :(得分:2)
使用ourPk len is 91
EC key length : 65
并等待WebDriverWait
并使用以下CSS选择器。
presence_of_all_elements_located
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://apps.apple.com/us/app/bank-of-america-private-bank/id1096813830")
review_ratings =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.we-customer-review__body p[dir="ltr"]')))
review_ratingsList = []
for e in review_ratings:
review_ratingsList.append(e.get_attribute('innerHTML'))
print(review_ratingsList)