Question

我正试图抓住Fodor的网站，为阿姆斯特丹的所有餐厅选择。我可以从链接带你的初始页面中提取名称，价格范围和菜肴类型，但是我很难尝试拉出“href”属性来获取我想要抓取的其他页面。

#import libraries
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import time
from selenium.webdriver.common.keys import Keys

i = 1
#loop for the multiple pages
for i in range(1, 2, 1):
    i = str(i)
    url = "http://www.fodors.com/world/europe/netherlands/amsterdam/restaurants/fodors-choice/" + i
    driver = webdriver.PhantomJS()
    driver.get(url)
    for nameelem in driver.find_elements_by_tag_name("h2"):
        print nameelem.text
    for priceandcuisineelem in driver.find_elements_by_class_name("keywords"):
        print priceandcuisineelem.text
    #pull link to scrape other pages
    for restpage in driver.find_elements_by_tag_name("a"):
        print restpage.get_attribute("href")

问题是，使用我当前的代码，我在页面上拉出每个可能的href。我只想要与我正在刮的餐馆一起。

Python / Selenium：链接和抓取链接的刮擦页面

0 个答案: