无法点击Phantom中的元素

时间:2016-04-20 14:39:40

标签: javascript node.js web-scraping click phantomjs

我正在使用 Nodejs 和npm模块 Phantom 来废弃网页。 单击某个范围时,我需要的信息带有ajax请求。

目的: 在网站'www.academiadasapostas.com/stats/team/961#tab=t_stats'中,我想点击'德甲'按钮来删除信息。

问题: 我不能直接去按钮网址(www.academiadasapostas.com/stats/team/961#tab=t_stats&team_id=961&competition_id=9&page=1),我不知道如何在幻影中点击按钮

我的代码:

var url = 'https://www.academiadasapostas.com/stats/team/961#tab=t_stats';
phantomInstance.createPage()
    .then((page) => {
        phantomPage = page;
        return page.open(url);
    })
    .then((status) => {
        phantomPage.evaluate(function() {
            //trying click
            return document.querySelectorAll('[data-id]')[1].click();
        })
        .then(function(){
            return phantomPage.property('content');
        })
        .then((content) => {
            // handle content of page
        });
    });

HTML快照:

<td> 
    <span class="competition all " data-id="0" onclick="teamAjax_Filterchange(this)" style="float: left; display: none;">Tudo
    </span>
    <span class="competition " data-id="9" onclick="teamAjax_Filterchange(this)">                                  
        <ul class="flag" title=""><li class="ar a80" title=""></li><li class="co c1"></li><li class="co chover"></li></ul>Bundesliga
    </span>
    <span class="competition " data-id="10" onclick="teamAjax_Filterchange(this)">                                     
        <ul class="flag" title=""><li class="ar a7" title=""></li><li class="co clc"></li><li class="co chover"></li></ul>UEFA Champions League
    </span>
</td>

编辑1: 我试过这个,但似乎也不行:

phantomPage.evaluate(function() { 
    var ev = document.createEvent("MouseEvent");
    ev.initMouseEvent(
        "click",
        true /* bubble */, true /* cancelable */,
        window, null,
        0, 0, 0, 0, /* coordinates */
        false, false, false, false, /* modifier keys */
        0 /*left*/, null
    );
    return document.querySelectorAll('[data-id]')[1].dispatchEvent(ev);
})

1 个答案:

答案 0 :(得分:0)

我能够使用python和phantomjs使用以下代码来抓取该页面:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

url = 'https://www.academiadasapostas.com/stats/team/961#tab=t_stats&team_id=961'
driver = webdriver.PhantomJS()
driver.set_window_size(1024, 768)

xpath_IN = ".//*[@id='s']/div/div/div/div/div[2]/div/div[3]/div/table/tbody/tr[1]/td[2]/span[2]"
driver.get(url)

WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, xpath_IN)))
driver.find_element_by_xpath(xpath_IN).click()

xpath_IN = ".//*[@id='s']/div/div/div/div/div[2]/div/div[3]/table[2]/tbody/tr[19]/td[1]"
WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, xpath_IN)))

soup = BeautifulSoup(driver.page_source, 'lxml')
f = open('temp.txt', 'w')
f.write(soup.prettify())
f.close()

driver.close()

我使用Bundesliga按钮的xpath找到并点击它。然后我再次使用xpath路径作为点击成功后出现的最后一行(Cartõesvermelhos)。这样做是为了等待点击后加载所有项目。

我使用BeautifulSoup阅读该页面并将其打印出来“美化”以确认所有内容都已正常加载。

如果您不熟悉xpath,请在Firefox中安装firebug和firepath插件,右键单击要获取的元素即可获取xpath。

希望这会有所帮助。