我该如何正确使用Selenium

时间:2016-08-03 00:27:07

标签: python python-3.x selenium-webdriver web-scraping yahoo-finance

我试图从雅虎财经(http://finance.yahoo.com/quote/AAPL/financials?p=AAPL),资产负债表,股东权益总额中获取一个号码。如果我检查元素,我得到这个:

<span data-reactid=".1doxyl2xoso.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2:1:$BALANCE_SHEET.0.0.$TOTAL_STOCKHOLDER_EQUITY.1:$0.0.0">119,355,000</span>

我想得到,废弃这个数字:119,355,000。

如果我理解正确,网页是用Java Script编写的,我需要使用Selenium来获得所需的数字。无论我做什么,我的尝试(我完全是初学者)都不起作用,贝娄是许多尝试中的三个。我尝试使用&#39; data-reactid&#39;还有很少的其他事情,我已经没有想法了: - )

elem = Browser.find_element_by_partial_link_text('TOTAL_STOCKHOLDER_EQUITY')
elem = browser.find_element_by_id('TOTAL_STOCKHOLDER_EQUITY') 
elem = browser.find_elem_by_id('TOTAL_STOCKHOLDER_EQUITY')

1 个答案:

答案 0 :(得分:1)

实际上您的所有定位器看似无效,请尝试使用find_element_by_css_selector,如下所示: -

elem = browser.find_element_by_css_selector("span[data-reactid *= 'TOTAL_STOCKHOLDER_EQUITY']")

注意:find_element_by_partial_text仅用于定位a,文本内容与其属性文本不匹配,find_element_by_id用于查找具有id属性的任何元素这将与传递值完全匹配。

已修改: - 在提供的定位器中找到了更多元素,因此您应该尝试找到确切的Total Stockholder Equitytr元素,然后查找所有td元素from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC browser = webdriver.Chrome() browser.get('http://finance.yahoo.com/quote/AAPL/financials?p=AAPL') browser.maximize_window() wait = WebDriverWait(browser, 5) try: #first try to find balance sheet link and click on it balanceSheet = wait.until(EC.element_to_be_clickable((By.XPATH, "//span[text() = 'Balance Sheet']"))) balanceSheet.click() #Now find the row element of Total Stockholder Equity totalStockRow = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "tr[data-reactid *= 'TOTAL_STOCKHOLDER_EQUITY']"))) #Now find all the columns included with Total Stockholder Equity totalColumns = totalStockRow.find_elements_by_tag_name("td") #Now if you want to print single value just pass the index into totalColumns other wise print all values in the loop #Now print all values in the loop for elem in totalColumns: print elem.text #it will print value as #Total Stockholder Equity #119,355,000 #111,547,000 #123,549,000 except: print('Was not able to find the element with that name.') 1}}元素如下: -

baseUrl="http://testurl:8080" npm run e2e-bvt

希望它有帮助...:)