使用Selenium获取JavaScript函数调用值

时间:2015-05-27 13:25:22

标签: python selenium selenium-webdriver web-scraping scrapy

我正在使用python-scrapy抓取网页,这对静态内容非常有效。我试图从this页面抓取一个网址但事实证明,它是通过javascript调用返回的。为此,我使用硒但无法弄清楚如何做到这一点。

如果您点击"尺寸表"在给定的链接上,您会看到一个提示尺寸指南的弹出窗口。如何在我的程序中获取本指南的网址?

我在koovs上也遇到类似的问题以及获取尺寸指南。如果有人可以指导任何链接,我真的很感激。

1 个答案:

答案 0 :(得分:1)

找到"尺寸表"按链接文本链接,单击它并提取数据,例如:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3')

wait = WebDriverWait(driver, 10)
chart = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "size chart")))
chart.click()

for title in driver.find_elements_by_css_selector("div.size-chart-body div.size-chart table th"):
    print title.text

driver.close()

打印(表标题行,为了示例):

Indian Size
Euro Size
Garment Bust (In.)
Garment Waist (in.)
Garment Hip (in.):

请注意,您不需要selenium来获取尺寸图表数据,它已经在DOM内部,但在您点击"尺寸图表"之前不可见。您可以使用Scrapy达到相同大小的图表。来自" Scrapy Shell的演示":

$ scrapy shell http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3
In [1]: for title in response.css("div.size-chart-body div.size-chart table th")[1:]:
    print title.xpath("text()").extract()[0]
   ...:     
Indian Size
Euro Size
Garment Bust (In.)
Garment Waist (in.)
Garment Hip (in.)

对于Koovs,您仍然可以避免使用selenium并手动提取大小图表URL来提取类别和交易名称,例如:

$ scrapy shell http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651
In [1]: category = response.xpath("//input[@id='master_category_name_id_ref']/@value").extract()[0]

In [2]: deal = response.xpath("//input[@id='deal_id']/@value").extract()[0]
In [3]: "http://www.koovs.com/koovs/sizechart/women/{category}/{deal}".format(category=category, deal=deal)
Out[3]: 'http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554'

而且,如果你仍然想要使用硒,那么你就是:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651&skuid=236376')

wait = WebDriverWait(driver, 10)
chart = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[size_chart]")))
chart.click()

driver.switch_to.window(driver.window_handles[-1])

print driver.current_url

driver.close()

打印:

http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554