我正在查看this页。我正在尝试使用Selenium和chromdriver刮擦此数据(由红色标记显示):
这是我的Python代码:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from time import sleep
chrome_options = Options()
chrome_options.add_argument("--disable-infobars")
chrome_options.add_argument("disable-infobars")
driver = webdriver.Chrome(executable_path="/ABC/chromedriver", chrome_options=chrome_options)
driver.get("https://finance.yahoo.com/quote/IBM")
sleep(10)
estimated = driver.find_element_by_class_name("IbBox Ta(start) C($tertiaryColor)")
但是代码未获得Est. Return
,经过长时间的等待后,它返回以下错误消息:
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
我在做什么错?从页面获取Est Return
值的最佳和最快的方法是什么?
更新: 这是我在Chrome中使用检查元素的结果:
答案 0 :(得分:1)
标头在获取所追求的价值方面起着重要作用,因此请确保拥有一个。鉴于这是您获得所需内容的方式。
import requests
from bs4 import BeautifulSoup
link = "https://finance.yahoo.com/quote/IBM"
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36'}
r = requests.get(link,headers=headers)
soup = BeautifulSoup(r.text,"lxml")
est_return = soup.select_one("[class='Mb\(8px\)']").get_text()
print(est_return)
答案 1 :(得分:0)
您可以改用XPath吗,它应该像这样:
estimated = driver.find_element_by_xpath("*//div[@class='IbBox Ta(start) C($tertiaryColor)']").text()
让我知道如何进行! :D
答案 2 :(得分:0)
此错误消息...
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
...暗示您使用的定位器策略不是有效的表达式。
要刮擦文本 -6%估算值返回,您需要为visibility_of_element_located()
诱导 WebDriverWait ,然后可以使用以下Locator Strategy:
使用XPATH
:
driver.get('https://finance.yahoo.com/quote/IBM')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Near Fair Value']//following::div[1]/div"))).text)
控制台输出:
-6% Est. Return
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC