我正在尝试从https://surviv.io/stats/player787抓取一些已加载的JS数据,例如总击杀次数。有人可以告诉我如何用硒抓取js加载的数据。谢谢。
编辑:这是一些代码
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://surviv.io/stats/player787')
b = browser.find_element_by_tag_name('tr')
包含我想要的数据的'tr'未被硒捕获
答案 0 :(得分:2)
之所以找不到它,是因为页面未完全呈现。您可以添加一个硒等待,直到指定的元素首先被渲染,该操作才能继续进行。
此外,如果它在<table>
标记中,请让熊猫为您进行解析(它使用引擎盖下的beautifulsoup提取<table>
,<th>
,{{1} }和<tr>
标签在获取呈现的html源后将它们作为数据帧列表返回:
<td>
输出:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import pandas as pd
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
browser.get('https://surviv.io/stats/player787')
delay = 3 # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.CLASS_NAME, 'player-stats-overview')))
df = pd.read_html(browser.page_source)[0]
print (df.loc[0,'Kills'])
browser.close()
答案 1 :(得分:2)
要获得击杀次数,请得出WebDriverWait
和visibility_of_all_elements_located
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://surviv.io/stats/player787')
allkills = WebDriverWait(browser,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//div[@class='card-mode-stat-name' and text()='KILLS']/following-sibling::div[1]")))
for item in allkills:
print(item.text)
答案 2 :(得分:1)
您可以避免浏览器的开销,而只需模拟页面发出的POST请求。
import requests
headers = {'content-type': 'application/json; charset=UTF-8'}
data = {"slug":"player787","interval":"all","mapIdFilter":"-1"}
r = requests.post('https://surviv.io/api/user_stats', headers=headers, json=data)
data = r.json()
desired_stats = ['wins', 'kills', 'games', 'kpg']
for stat in desired_stats:
print(stat, ': ' , data[stat])
对于OP:
在我的答案中单击URL指示的相应xhr时,可以看到“网络”选项卡中有效负载的视图(您需要向下滚动以查看有效负载信息)
答案 3 :(得分:0)
要从JS加载的页面中抓取值 652 , 19152 , 8926 , 2.1 等必须为visibility_of_all_elements_located()
引入 WebDriverWait ,您可以使用以下任一Locator Strategies:
使用CSS_SELECTOR
:
driver.get('https://surviv.io/stats/player787')
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table.player-stats-overview td")))])
使用XPATH
:
driver.get('https://surviv.io/stats/player787')
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='player-stats-overview']//td")))])
控制台输出:
['652', '19152', '8926', '2.1']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC