我是Xpath的新手。我正在尝试抓取一个股票网站,以获取每个元素的名称和价值。 在我的python硒脚本中,本地提取了html_content中网页的主要部分,如下所示。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import NoSuchElementException
dirinstall="C:\\Program Files (x86)\\www\mm\\"
chrome_driver = dirinstall+"\\Webdriver\\chromedriver.exe"
options = Options()
driver = webdriver.Chrome(chrome_driver, options=options)
html_content = """
<html class="ng-scope">
<head data-meta-tags="">
<title> Stock NYSE </title>
<ui-layout class="ng-isolate-scope">
<div data-ng-include="" src="layoutCtrl.template" class="ng-scope">
<app-root class="ng-scope" _nghost-rqp-c0="" ng-version="8.2.14"></app-root>
<div ng-class="{'demo-mode': $root.session.user.portfolio.account.type === 'Demo' }" class="ng-scope">
<div ng-view="" ng-class="layoutCtrl.isBannerShown ? 'banner-shown' : ''" class="main-app-view ng-scope" role="main">
<et-discovery-markets-results class="ng-scope" _nghost-rqp-c42="" ng-version="8.2.14">
<div _ngcontent-rqp-c42="" class="discover main-content no-footer" ui-fun-scroll="{'class': 'minimize', 'classEl': '.user-head-wrapper, .table-discover', 'scrollContainer': '.table-discover', 'setClassAtScroll': 200 }">
<div _ngcontent-rqp-c42="" automation-id="discover-market-results-wrapp" class="table-discover markets-table">
<et-discovery-markets-results-list _ngcontent-rqp-c42="" automation-id="discover-market-results-sub-view-list" _nghost-rqp-c44="" class="ng-star-inserted">
<div _ngcontent-rqp-c44="" class="market-list list-view" data-etoro-locale-ns="discoverMarketResultsList">
<et-instrument-mobile-row _ngcontent-rqp-c44="" automation-id="discover-market-results-row" _nghost-rqp-c18="" class="ng-star-inserted">
<et-instrument-trading-mobile-row _ngcontent-rqp-c18="" automation-id="watchlist-grid-instruments-list" _nghost-rqp-c47="" class="ng-star-inserted">
<div _ngcontent-rqp-c47="" class="row-wrap">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-list-wrapp-instrument" class="instrument-cell name-cell">
<div _ngcontent-rqp-c47="" class="avatar-img-wrap"> </div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-wrapp-instrument-info" class="avatar-info">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-name" class="symbol">A</div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-full-name" class="name positive"> 0.68 (0.90%) </div>
</div>
</div>
<et-buy-sell-buttons _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-buy-sell-container" class="instrument-cell buy-sell-buttons" _nghost-rqp-c24="">
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label positive-change" automation-id="buy-sell-button-container-sell">
<div _ngcontent-rqp-c27="" class="trade-button-title">S</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">75.<span class="after-decimal">85</span></div>
</div>
</et-buy-sell-button>
<div _ngcontent-rqp-c24="" class="space-gap"></div>
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label negative-change" automation-id="buy-sell-button-container-buy">
<div _ngcontent-rqp-c27="" class="trade-button-title">B</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">76.<span class="after-decimal">03</span></div>
</div>
</et-buy-sell-button>
</et-buy-sell-buttons>
</div>
<et-trade-item-card-action _ngcontent-rqp-c18="" _nghost-rqp-c15="">
</et-trade-item-card-action>
</et-instrument-trading-mobile-row>
</et-instrument-mobile-row>
<et-instrument-mobile-row _ngcontent-rqp-c44="" automation-id="discover-market-results-row" _nghost-rqp-c18="" class="ng-star-inserted">
<et-instrument-trading-mobile-row _ngcontent-rqp-c18="" automation-id="watchlist-grid-instruments-list" _nghost-rqp-c47="" class="ng-star-inserted">
<div _ngcontent-rqp-c47="" class="row-wrap">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-list-wrapp-instrument" class="instrument-cell name-cell">
<div _ngcontent-rqp-c47="" class="avatar-img-wrap"> </div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-wrapp-instrument-info" class="avatar-info">
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-name" class="symbol">AA</div>
<div _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-full-name" class="name negative"> -0.11 (-1.46%) </div>
</div>
</div>
<et-buy-sell-buttons _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-buy-sell-container" class="instrument-cell buy-sell-buttons" _nghost-rqp-c24="">
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label negative-change" automation-id="buy-sell-button-container-sell">
<div _ngcontent-rqp-c27="" class="trade-button-title">S</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">7.<span class="after-decimal">44</span></div>
</div>
</et-buy-sell-button>
<div _ngcontent-rqp-c24="" class="space-gap"></div>
<et-buy-sell-button _ngcontent-rqp-c24="" _nghost-rqp-c27="">
<div _ngcontent-rqp-c27="" class="prices no-label negative-change" automation-id="buy-sell-button-container-buy">
<div _ngcontent-rqp-c27="" class="trade-button-title">B</div>
<div _ngcontent-rqp-c27="" automation-id="buy-sell-button-rate-value" class="price">7.<span class="after-decimal">47</span></div>
</div>
</et-buy-sell-button>
</et-buy-sell-buttons>
</div>
<et-trade-item-card-action _ngcontent-rqp-c18="" _nghost-rqp-c15="">
</et-trade-item-card-action>
</et-instrument-trading-mobile-row>
</et-instrument-mobile-row>
</div>
</et-discovery-markets-results-list>
</div>
</div>
</et-discovery-markets-results>
</div>
</div>
</div>
</ui-layout>
</body>
</html>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
#results = driver.find_elements_by_xpath("//*[@class='ng-star-inserted']")
results = driver.find_elements_by_xpath("//*[et-instrument-mobile-row and @class='ng-star-inserted']")
print('Number of results', len(results))
我不知道为什么如果我搜索“ et-instrument-mobile-row”,我只会得到1个元素而不是2个元素,并且如果我同时搜索“ et-instrument-mobile-row”和“ ng-star-插入”,我得到0个元素。 通过查看示例,我的目标是获取买入/卖出的代码和当前值(价格和十进制小数)。
类似的东西:
[A,75.85,76.03]
[AA,7.44,7.47]
有人可以帮助我吗?谢谢!
答案 0 :(得分:0)
您似乎有一些格式错误的HTML,Selenium不确定如何解析它。我注意到这一行:
<div _ngcontent-rqp-c47="" class="avatar-img-wrap"><img _ngcontent-rqp-c47="" automation-id="watchlist-item-grid-instrument-avatar" class="avatar-img" src="https://etoro-cdn.etorostatic.com/market-avatars/a/150x150.png" alt="Agilent Technologies Inc">
此<img>
标签未关闭。您会发现语法高亮在这里也感到困惑。
否则,您正在搜索的XPath通常看起来格式正确。
编辑:仔细查看。您的属性名称应位于*
所在的位置。
这是您的XPath:
"//et-instrument-mobile-row[@class='ng-star-inserted']"
编辑2:Asker对于如何使用上述XPath在所找到的内容中进行搜索还有其他疑问。
要在这些元素中查找更多元素,请查看the documentation,每个硒WebElement
提供其自己的find_element
方法。然后,您可以使用它们在我们刚刚找到的那些元素中进行进一步搜索(请确保在XPath中使用.//
,因为您只想遍历该特定元素的内容-其他find_elements没有此警告)。
一旦确定了包含符号和价格的元素,就可以简单地引用这些元素上的text
属性。让我们看一个简单的例子:
<div class="a">
<div class="b" id="1">B</div>
<div class="c" id="2">2</div>
<div class="d" id="3">22</div>
</div>
假设我们已经在此处找到根div
,并将其存储在名为element
的变量中。然后:
symbol = element.find_element_by_xpath(".//*[@class='b']").text
integral = element.find_element_by_xpath(".//*[@class='c']").text
fractional = element.find_element_by_xpath(".//*[@class='d']").text
通常,如果您可以通过XPath以外的其他方式进行搜索,则对每个涉事人员来说都更加容易。这是使用类名称完成此操作的一种更典型的方法:
symbol = element.find_element_by_class_name("b").text
integral = element.find_element_by_class_name("c").text
fractional = element.find_element_by_class_name("d").text
编辑3:作者的注释
在@firstbass的宝贵帮助下,我深入研究以获得代号和不同的买卖价格,如下所示:
for element in results:
symbol = element.find_element_by_xpath(".//*[@class='symbol']").text
print(str(symbol))
sell = element.find_element_by_xpath(".//et-buy-sell-buttons//et-buy-sell-button//div[@automation-id='buy-sell-button-container-sell']")
sell_integral = sell.find_element_by_xpath(".//*[@class='price']").text
sell_fractional = sell.find_element_by_xpath(".//*[@class='after-decimal']").text
print(str(sell_integral)+':'+str(sell_fractional))
buy = element.find_element_by_xpath(".//et-buy-sell-buttons//et-buy-sell-button//div[@automation-id='buy-sell-button-container-buy']")
buy_integral = buy.find_element_by_xpath(".//*[@class='price']").text
buy_fractional = buy.find_element_by_xpath(".//*[@class='after-decimal']").text
print(str(buy_integral)+':'+str(buy_fractional))