为什么当我添加time.sleep(2)时,我得到了我想要的输出但是如果我添加等待直到特定的xpath它会得到更少的结果?
使用time.sleep(2)输出(也需要):
Adelaide Utd
Tottenham
Dundee Fc
...
数:145个名字
删除time.sleep
Adelaide Utd
Tottenham
Dundee Fc
...
数:119名
我已添加:
clickMe = wait(driver, 13).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ("#page-container > div:nth-child(4) > div > div.ubet-sports-section-page > div > div:nth-child(2) > div > div > div:nth-child(1) > div > div > div.page-title-new > h1"))))
由于此元素出现在所有页面上。
似乎要少得多。我怎样才能解决这个问题?
脚本:
import csv
import os
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait as wait
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('https://ubet.com/sports/soccer')
clickMe = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//select[./option="Soccer"]/option'))))
options = driver.find_elements_by_xpath('//select[./option="Soccer"]/option')
indexes = [index for index in range(len(options))]
for index in indexes:
try:
try:
zz = wait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '(//select/optgroup/option)[%s]' % str(index + 1))))
zz.click()
except StaleElementReferenceException:
pass
from selenium.webdriver.support.ui import WebDriverWait
def find(driver):
pass
from selenium.common.exceptions import StaleElementReferenceException, NoSuchElementException
import time
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ("#page-container > div:nth-child(4) > div > div.ubet-sports-section-page > div > div:nth-child(2) > div > div > div:nth-child(1) > div > div > div.page-title-new > h1"))))
langs0 = driver.find_elements_by_css_selector(
"div > div > div > div > div > div > div > div > div.row.collapse > div > div > div:nth-child(2) > div > div > div > div > div > div.row.small-collapse.medium-collapse > div:nth-child(1) > div > div > div > div.lbl-offer > span")
langs0_text = []
for lang in langs0:
try:
langs0_text.append(lang.text)
except StaleElementReferenceException:
pass
directory = 'C:\\A.csv' #####################################
with open(directory, 'a', newline='', encoding="utf-8") as outfile:
writer = csv.writer(outfile)
for row in zip(langs0_text):
writer.writerow(row)
except StaleElementReferenceException:
pass
如果您无法访问页面,则需要vpn。
更新...
也许该元素在其他元素之前加载。因此,如果我们将其更改为datascraped(并非所有页面都有要删除的数据)。
添加:
尝试:
clickMe = wait(driver, 13).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ("div > div > div > div > div > div > div > div > div.row.collapse > div > div > div:nth-child(2) > div > div > div > div > div > div.row.small-collapse.medium-collapse > div:nth-child(3) > div > div > div > div.lbl-offer > span"))))
except TimeoutException as ex:
pass
同样的问题仍然存在
手动步骤:
#Load driver.get('https://ubet.com/sports/soccer')
#Click drop down (//select/optgroup/option
#Wait for page elements so can scrape
Scrape:
div > div > div > div > div > div > div > div > div.row.collapse > div > div > div:nth-child(2) > div > div > div > div > div > div.row.small-collapse.medium-collapse > div:nth-child(1) > div > div > div > div.lbl-offer > span
Loop repeat.
答案 0 :(得分:6)
该网站建立在angularjs上,所以最好的办法是等到角度处理完所有AJAX请求后(我不会深入了解底层机制,但整个网络上有很多关于该主题的资料) )。为此,我通常会在等待时定义要检查的自定义预期条件:
class NgReady:
js = ('return (window.angular !== undefined) && '
'(angular.element(document).injector() !== undefined) && '
'(angular.element(document).injector().get("$http").pendingRequests.length === 0)')
def __call__(self, driver):
return driver.execute_script(self.js)
# NgReady does not have any internal state, so one instance
# can be reused for waiting multiple times
ng_ready = NgReady()
现在用它在zz.click()
之后等待:
zz.click()
wait(driver, 10).until(ng_ready)
原始代码未经修改(没有睡觉或等待ng_ready
):
$ python so-47954604.py && wc -l out.csv && rm out.csv
86 out.csv
在time.sleep(10)
之后使用zz.click()
:
$ python so-47954604.py && wc -l out.csv && rm out.csv
101 out.csv
在wait(driver, 10).until(ng_ready)
之后使用zz.click()
时的结果相同:
$ python so-47954604.py && wc -l out.csv && rm out.csv
101 out.csv
NgReady
不是我的发明,我只是将它从Java中实现的预期条件移植到python我发现here,所以所有的信用都归到了答案的作者。
答案 1 :(得分:4)
NgReady
中使用的逻辑仅检查要定义的角度,并且没有待处理的待处理请求。即使它适用于这个网站,它也不是Angular准备好与一起使用的问题的明确答案。
如果我们查看what Protractor
- the Angular end-to-end testing framework - does to "sync" with Angular,则会使用Angular内置的"Testability" API。
还有这个pytractor
package使用WebDriverMixin
扩展了selenium webdriver实例,会在每次交互时自动保持驱动程序和角度之间的同步。
您可以直接开始使用pytractor
(虽然它被放弃作为一个包)。或者,我们可以尝试apply the ideas implemented there in order to always keep our webdriver synced with Angular。为此,让我们创建这个waitForAngular.js
script(我们只使用Angular 1和2支持逻辑 - 我们总是可以使用相关的Protractor的客户端脚本来扩展它):
try { return (function (rootSelector, callback) {
var el = document.querySelector(rootSelector);
try {
if (!window.angular) {
throw new Error('angular could not be found on the window');
}
if (angular.getTestability) {
angular.getTestability(el).whenStable(callback);
} else {
if (!angular.element(el).injector()) {
throw new Error('root element (' + rootSelector + ') has no injector.' +
' this may mean it is not inside ng-app.');
}
angular.element(el).injector().get('$browser').
notifyWhenNoOutstandingRequests(callback);
}
} catch (err) {
callback(err.message);
}
}).apply(this, arguments); }
catch(e) { throw (e instanceof Error) ? e : new Error(e); }
然后,让我们继承webdriver.Chrome
并修补execute()
方法 - 这样每次进行交互时,我们都会在交互之前检查Angular是否准备就绪:
import csv
from selenium import webdriver
from selenium.webdriver.remote.command import Command
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.common.by import By
from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.support import expected_conditions as EC
COMMANDS_NEEDING_WAIT = [
Command.CLICK_ELEMENT,
Command.SEND_KEYS_TO_ELEMENT,
Command.GET_ELEMENT_TAG_NAME,
Command.GET_ELEMENT_VALUE_OF_CSS_PROPERTY,
Command.GET_ELEMENT_ATTRIBUTE,
Command.GET_ELEMENT_TEXT,
Command.GET_ELEMENT_SIZE,
Command.GET_ELEMENT_LOCATION,
Command.IS_ELEMENT_ENABLED,
Command.IS_ELEMENT_SELECTED,
Command.IS_ELEMENT_DISPLAYED,
Command.SUBMIT_ELEMENT,
Command.CLEAR_ELEMENT
]
class ChromeWithAngular(webdriver.Chrome):
def __init__(self, root_element, *args, **kwargs):
self.root_element = root_element
with open("waitForAngular.js") as f:
self.script = f.read()
super(ChromeWithAngular, self).__init__(*args, **kwargs)
def wait_for_angular(self):
self.execute_async_script(self.script, self.root_element)
def execute(self, driver_command, params=None):
if driver_command in COMMANDS_NEEDING_WAIT:
self.wait_for_angular()
return super(ChromeWithAngular, self).execute(driver_command, params=params)
driver = ChromeWithAngular(root_element='body')
# the rest of the code as is with what you had
同样,pytractor
和protractor
项目严重影响了这一点。