硒刮JS加载页面

时间:2019-12-23 14:04:46

标签: selenium beautifulsoup

enter image description here

我正在尝试从https://surviv.io/stats/player787抓取一些已加载的JS数据,例如总击杀次数。有人可以告诉我如何用硒抓取js加载的数据。谢谢。

编辑:这是一些代码

from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://surviv.io/stats/player787')
b = browser.find_element_by_tag_name('tr')

包含我想要的数据的'tr'未被硒捕获

4 个答案:

答案 0 :(得分:2)

之所以找不到它,是因为页面未完全呈现。您可以添加一个硒等待,直到指定的元素首先被渲染,该操作才能继续进行。

此外,如果它在<table>标记中,请让熊猫为您进行解析(它使用引擎盖下的beautifulsoup提取<table><th>,{{1} }和<tr>标签在获取呈现的html源后将它们作为数据帧列表返回:

<td>

输出:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pandas as pd

browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get('https://surviv.io/stats/player787')
delay = 3 # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'player-stats-overview')))

df = pd.read_html(browser.page_source)[0]

print (df.loc[0,'Kills'])

browser.close()

答案 1 :(得分:2)

要获得击杀次数,请得出WebDriverWaitvisibility_of_all_elements_located

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver

browser = webdriver.Firefox()
browser.get('https://surviv.io/stats/player787')
allkills = WebDriverWait(browser,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@class='card-mode-stat-name' and text()='KILLS']/following-sibling::div[1]")))
for item in allkills:
    print(item.text)

答案 2 :(得分:1)

您可以避免浏览器的开销,而只需模拟页面发出的POST请求。

import requests

headers = {'content-type': 'application/json; charset=UTF-8'}
data = {"slug":"player787","interval":"all","mapIdFilter":"-1"}
r = requests.post('https://surviv.io/api/user_stats', headers=headers, json=data)
data = r.json()
desired_stats = ['wins', 'kills', 'games', 'kpg'] 
for stat in desired_stats:
    print(stat, ': ' , data[stat])

对于OP:

在我的答案中单击URL指示的相应xhr时,可以看到“网络”选项卡中有效负载的视图(您需要向下滚动以查看有效负载信息)

enter image description here

答案 3 :(得分:0)

要从JS加载的页面中抓取值 652 19152 8926 2.1 等必须为visibility_of_all_elements_located()引入 WebDriverWait ,您可以使用以下任一Locator Strategies

  • 使用CSS_SELECTOR

    driver.get('https://surviv.io/stats/player787')
    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.player-stats-overview td")))])
    
  • 使用XPATH

    driver.get('https://surviv.io/stats/player787')
    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='player-stats-overview']//td")))])
    
  • 控制台输出:

    ['652', '19152', '8926', '2.1']
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC