我正在使用python-scrapy抓取网页,这对静态内容非常有效。我试图从this页面抓取一个网址但事实证明,它是通过javascript调用返回的。为此,我使用硒但无法弄清楚如何做到这一点。
如果您点击"尺寸表"在给定的链接上,您会看到一个提示尺寸指南的弹出窗口。如何在我的程序中获取本指南的网址?
我在koovs上也遇到类似的问题以及获取尺寸指南。如果有人可以指导任何链接,我真的很感激。
答案 0 :(得分:1)
找到"尺寸表"按链接文本链接,单击它并提取数据,例如:
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3')
wait = WebDriverWait(driver, 10)
chart = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "size chart")))
chart.click()
for title in driver.find_elements_by_css_selector("div.size-chart-body div.size-chart table th"):
print title.text
driver.close()
打印(表标题行,为了示例):
Indian Size
Euro Size
Garment Bust (In.)
Garment Waist (in.)
Garment Hip (in.):
请注意,您不需要selenium来获取尺寸图表数据,它已经在DOM内部,但在您点击"尺寸图表"之前不可见。您可以使用Scrapy达到相同大小的图表。来自" Scrapy Shell的演示":
$ scrapy shell http://www.jabong.com/athena-Red-Black-Top-476472.html?pos=3
In [1]: for title in response.css("div.size-chart-body div.size-chart table th")[1:]:
print title.xpath("text()").extract()[0]
...:
Indian Size
Euro Size
Garment Bust (In.)
Garment Waist (in.)
Garment Hip (in.)
对于Koovs,您仍然可以避免使用selenium并手动提取大小图表URL来提取类别和交易名称,例如:
$ scrapy shell http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651
In [1]: category = response.xpath("//input[@id='master_category_name_id_ref']/@value").extract()[0]
In [2]: deal = response.xpath("//input[@id='deal_id']/@value").extract()[0]
In [3]: "http://www.koovs.com/koovs/sizechart/women/{category}/{deal}".format(category=category, deal=deal)
Out[3]: 'http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554'
而且,如果你仍然想要使用硒,那么你就是:
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get('http://www.koovs.com/only-onlall-stripe-ls-shirt-59554.html?from=category-651&skuid=236376')
wait = WebDriverWait(driver, 10)
chart = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[size_chart]")))
chart.click()
driver.switch_to.window(driver.window_handles[-1])
print driver.current_url
driver.close()
打印:
http://www.koovs.com/koovs/sizechart/women/Shirts--651--799--896/59554