我想从这个网站获取表格内容:“https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2”。什么时候 I Inspect Element,在Chrome浏览器上,我可以在浏览器中显示的DOMTree中找到表条目。但是当我运行以下代码时,我会得到一个与https://www.premierleague.com/stats/top/players/red_card中的表对应的不同表。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
BASEURL = "https://www.premierleague.com/stats/top/players/"
driver = webdriver.Chrome("/Users/manpreet/Downloads/chromedriver")
driver.get("https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2")
##for i in range(5000):
## print i
## time.sleep(1)
try:
elem = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table'))
)
finally:
print('10 secs over')
print(elem.text)
我将WebDriverWait函数调用了30秒,但我没有得到正确的表格。我注意到当我使用WebDriverWait时,Selenium打开的浏览器会在https://www.premierleague.com/stats/top/players/red_card中显示整个30秒的表格。但是,当我不使用WebDriverWait时,驱动程序首先在https://www.premierleague.com/stats/top/players/red_card中显示该表,页面加载几秒钟,然后在https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2中显示该表。整个过程只需要5-6秒(最多)。我认为当我使用WebDriverWait时,Ajax调用会卡住。这可能是selenium没有返回正确表格的原因,因为Selenium会刮擦显示的内容。
有人能告诉我如何获得正确的桌子吗?
答案 0 :(得分:0)
我不认为WebDriverWait可能会中断页面加载。
此处需要注意的一点是,即使您尝试直接获取网址https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2(例如第2页),应用程序也会首先加载完整列表,如https://www.premierleague.com/stats/top/players/red_card(比如第1页),然后将查询参数传递给过滤器(俱乐部和季节)。
现在问题是,有一个元素满足你的元素定位器 EC.presence_of_element_located((By.XPATH,' // [@ id =" mainContent"] / div [2] / div / div [2] / div [ 1] / div [2] / table'))*在第1页本身。 (即在url https://www.premierleague.com/stats/top/players/red_card中),所以selenium已经获得了它的元素,因此你看到了#1页面而不是第2页的文本。
你能做什么:
**在使用 get 方法调用之后,你可以要求Python休眠一段时间,这样在你尝试找到元素之前,实际页面(第2页)可能会被加载
...
driver.get("https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2")
time.sleep(3)
try:
...
**在应用程序重定向到第2页之前,您可以观察到在页面中设置了过滤器值,并且加载器div出现并消失。你可以等到装载机出现并消失。下面的代码对我有用。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
# wait for the loading to be complete
def waitFn():
for x in range(4):
try:
elem = WebDriverWait(driver, delay).until(
EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.loader-small'))
)
elem = WebDriverWait(driver, delay).until(
EC.invisibility_of_element_located((By.CSS_SELECTOR, 'div.loader-small'))
)
except TimeoutException:
continue
BASEURL = "https://www.premierleague.com/stats/top/players/"
driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver.get("https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2")
delay = 3 # seconds
##for i in range(5000):
## print i
# time.sleep(5)
waitFn()
try:
elem = WebDriverWait(driver, delay).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table'))
)
except TimeoutException:
print "Loading took too much time!"
finally:
print(elem.text)
答案 1 :(得分:0)
你需要更多等待。
1.等到统计下拉关闭。你可以等待css风格'转换'要改变的价值。在我的回答中查看自定义等待element_transform_changed
2.等待所有过滤器都显示出来。只需将WebDriverWait
与EC
一起使用
3.等待几秒钟等待Javascript执行。使用time.sleep()
。
BASEURL = "https://www.premierleague.com/stats/top/players/"
driver = webdriver.Chrome()
driver.get("https://www.premierleague.com/stats/top/players/red_card?se=42&cl=2")
##for i in range(5000):
## print i
## time.sleep(1)
class element_transform_changed(object):
def __init__(self, locator, text):
self.locator = locator
self.text = text
def __call__(self, driver):
wait = WebDriverWait(driver, 20)
element = wait.until(EC.presence_of_element_located(self.locator))
newText = element.value_of_css_property("transform")
if newText is None or len(newText)==0:
return False
print("OLD: " + self.text + ", NEW: " + newText)
if len(self.text)==0 or (self.text!=newText.strip()):
return element
else:
return False
try:
WebDriverWait(driver, 40).until(element_transform_changed((By.CSS_SELECTOR, "[data-script='pl_stats'] [class*='topStatsFilterDropdown'] ul"),"matrix(1, 0, 0, 1, 0, 0)"))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='FOOTBALL_COMPSEASON']")))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='FOOTBALL_CLUB']")))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='Nationality']")))
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "[data-dropdown-current='Position']")))
time.sleep(5)
elem = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//*[@id="mainContent"]/div[2]/div/div[2]/div[1]/div[2]/table')))
except:
print('ERROR')
print(elem.text)
time.sleep(10)