我正在使用 Nodejs 和npm模块 Phantom 来废弃网页。 单击某个范围时,我需要的信息带有ajax请求。
目的: 在网站'www.academiadasapostas.com/stats/team/961#tab=t_stats'中,我想点击'德甲'按钮来删除信息。
问题: 我不能直接去按钮网址(www.academiadasapostas.com/stats/team/961#tab=t_stats&team_id=961&competition_id=9&page=1),我不知道如何在幻影中点击按钮
我的代码:
var url = 'https://www.academiadasapostas.com/stats/team/961#tab=t_stats';
phantomInstance.createPage()
.then((page) => {
phantomPage = page;
return page.open(url);
})
.then((status) => {
phantomPage.evaluate(function() {
//trying click
return document.querySelectorAll('[data-id]')[1].click();
})
.then(function(){
return phantomPage.property('content');
})
.then((content) => {
// handle content of page
});
});
HTML快照:
<td>
<span class="competition all " data-id="0" onclick="teamAjax_Filterchange(this)" style="float: left; display: none;">Tudo
</span>
<span class="competition " data-id="9" onclick="teamAjax_Filterchange(this)">
<ul class="flag" title=""><li class="ar a80" title=""></li><li class="co c1"></li><li class="co chover"></li></ul>Bundesliga
</span>
<span class="competition " data-id="10" onclick="teamAjax_Filterchange(this)">
<ul class="flag" title=""><li class="ar a7" title=""></li><li class="co clc"></li><li class="co chover"></li></ul>UEFA Champions League
</span>
</td>
编辑1: 我试过这个,但似乎也不行:
phantomPage.evaluate(function() {
var ev = document.createEvent("MouseEvent");
ev.initMouseEvent(
"click",
true /* bubble */, true /* cancelable */,
window, null,
0, 0, 0, 0, /* coordinates */
false, false, false, false, /* modifier keys */
0 /*left*/, null
);
return document.querySelectorAll('[data-id]')[1].dispatchEvent(ev);
})
答案 0 :(得分:0)
我能够使用python和phantomjs使用以下代码来抓取该页面:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
url = 'https://www.academiadasapostas.com/stats/team/961#tab=t_stats&team_id=961'
driver = webdriver.PhantomJS()
driver.set_window_size(1024, 768)
xpath_IN = ".//*[@id='s']/div/div/div/div/div[2]/div/div[3]/div/table/tbody/tr[1]/td[2]/span[2]"
driver.get(url)
WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, xpath_IN)))
driver.find_element_by_xpath(xpath_IN).click()
xpath_IN = ".//*[@id='s']/div/div/div/div/div[2]/div/div[3]/table[2]/tbody/tr[19]/td[1]"
WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, xpath_IN)))
soup = BeautifulSoup(driver.page_source, 'lxml')
f = open('temp.txt', 'w')
f.write(soup.prettify())
f.close()
driver.close()
我使用Bundesliga按钮的xpath找到并点击它。然后我再次使用xpath路径作为点击成功后出现的最后一行(Cartõesvermelhos)。这样做是为了等待点击后加载所有项目。
我使用BeautifulSoup阅读该页面并将其打印出来“美化”以确认所有内容都已正常加载。
如果您不熟悉xpath,请在Firefox中安装firebug和firepath插件,右键单击要获取的元素即可获取xpath。
希望这会有所帮助。