我希望使用Python和Selenium库在https://www.fanteam.com/participate/138905/new/e30=上刮擦玩家的价格。我使用了以下代码:
url = 'https://www.fanteam.com/participate/138905/new/e30='
options = webdriver.ChromeOptions()
options.add_argument('--lang=en')
driver = webdriver.Chrome(chrome_options=options)
driver.get(url)
但是我无法让所有玩家获得价格,因为我在页面上找不到任何元素(请参见下图) players with prices)。 该网站的HTML:
<!DOCTYPE html>
<html lang="en">
<head>
<script type='text/javascript'>
</script>
<meta charset="UTF-8">
<link rel="shortcut icon" type="image/x-icon" href="/assets/favicon.ico">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no, minimal-ui">
<meta name="mobile-web-app-capable" content="yes">
<meta property="og:title" content="FanTeam: The home of Fantasy Sports">
<meta property="og:description" content="Create Your Daily Fantasy Team, Play & Win Cash!">
<meta property="og:site_name" content="FanTeam">
<meta property="og:image:width" content="300">
<meta property="og:image:height" content="300">
<meta property="og:url" content="https://www.fanteam.com/participate/138905/new/e30=">
<meta property="og:image" content="https://www.fanteam.com/assets/og-banner.png">
<link href="https://fonts.googleapis.com/css?family=Open+Sans:400,300,600,700,800&subset=latin,cyrillic,cyrillic-ext,latin-ext" rel="stylesheet" type="text/css">
<link rel="manifest" href="/manifest.json">
<script>
(function(getDescriptor) {
Object.getOwnPropertyDescriptor = function(obj, key) {
var descriptor = getDescriptor.apply(this, arguments)
if (!descriptor && obj === window && key == "showModalDialog") {
return {}
}
return descriptor
}
}(Object.getOwnPropertyDescriptor));
</script>
<style>
</style>
<title>FanTeam - Daily Fantasy & Betting</title>
</head>
<body>
<ft-cookie-warning></ft-cookie-warning>
<main>
<ft-header logo="fanteam-logo.svg" logosmall="logosmall.svg"></ft-header>
<section class="ft-view-port-wrapper">
<view-port></view-port>
</section>
<ft-footer tabindex="-1" logo="fanteam-logo.svg"></ft-footer>
<ft-push-receiver></ft-push-receiver>
<ft-olark></ft-olark>
</main>
<script src="https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.0.6/webcomponents-lite.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/babel-polyfill/6.26.0/polyfill.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/fetch/2.0.3/fetch.min.js"></script>
<script src="/build/application-b8ab977b2a.js" data-root="https://fanteam-game.api.scoutgg.net" data-ws="https://fanteam-game.ws.scoutgg.net" data-auth-url="" data-white-label="fanteam" data-olark="8903-397-10-7512" data-google-analytics="UA-55860585-1"
data-asset-host="https://d34h6ikdffho99.cloudfront.net" data-vapid-public-key="BH8zySo8DKTd9EY0koPSAmA7fo58QTVuFjcB4hTp95WDu21l4dwjckigl0hpYBgeS-6h2kbMtfbXw4u4097wK3w" data-scoutcc="https://scoutcc.scoutgg.net" data-payment-url="https://globpay.fantasy.solutions/v1"
data-projection-url="https://betflex-projection.api.scoutgg.net//api/v1" data-sportsbook-path="https://stage.fenixplayground.es/apuestas/mobilegoto.aspx" data-service-worker="sw.js"></script>
</body>
</html>
任何类似
的代码
el = driver.find_element_by_xpath("//div[@class='player-list']")
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class='player-list']"}
答案 0 :(得分:2)
您要抓取的网站的html中有一个shadow-DOM
,并且无法访问其中存在的所有html,这就是您获得NoSuchElementException
的原因。
当前,硒不支持shadow DOM
自动化,因此在这种情况下,您需要使用javascript来抓取数据。
要使用javascript获取数据,可以使用:
JavascriptExecutor js = (JavascriptExecutor) driver;
String return_value = (String) js.execute_script("return document.getElementByXpath('xpath').innerHTML");
影子DOM的参考:
https://medium.com/rate-engineering/a-guide-to-working-with-shadow-dom-using-selenium-b124992559f
https://www.seleniumeasy.com/selenium-tutorials/accessing-shadow-dom-elements-with-webdriver