难以同时打印不同页面上的项目

时间:2018-12-19 14:28:51

标签: python python-3.x selenium selenium-webdriver web-scraping

我已经用python与硒结合编写了一个脚本,以从其着陆页中解析不同餐厅的链接,然后在导航至其目标页后抓取每个餐厅的nameaddress 。很少有餐厅的链接上带有绿色的Featured图标,如下图所示。

Link to the landing page

我要做的是从着陆页(whether a restaurant is featured)抓取该信息,但将这些信息与 name {address ,当我的浏览器位于目标页面上时。

如何在我当前的name命令中同时打印addressFeatured和餐厅是否print

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def fetch_info(driver,link):
    driver.get(link)
    itemlinks = [item.get_attribute("href") for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.restaurant-header")))]

    for itemlink in itemlinks:
        driver.get(itemlink)
        name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"h1.name"))).text
        address = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".address-text-rest-menu span"))).text

        print(f'{name}\n{address}')

if __name__ == '__main__':
    url = "https://eatstreet.com/madison-wi/restaurants"
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    try:
        fetch_info(driver,url)
    finally:  
        driver.quit()

预期结果(登录页面上显示Featured

Doughboy's Pizza - Cottage Grove
447 W. Cottage Grove Rd Cottage Grove WI, 53527
Not Featured

Silver Mine Subs - Beltline
2601 W Beltline Hwy Madison WI, 53713
Not Featured

Adamah Neighborhood Table
611 Langdon St Madison WI, 53703
Featured

一个这样的Featured图标附加在着陆页的某些链接上。

enter image description here

3 个答案:

答案 0 :(得分:1)

如果要打印名称和“精选”(如果找到),请尝试

def fetch_info(driver,link):
    driver.get(link)
    items = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.restaurant-header")))
    featured = []
    for item in items:
        try:
            item.find_element_by_xpath('./following-sibling::div//span[.="Featured"]')
            featured.append('Featured')
        except:
            featured.append('Not featured')
    itemlinks = [item.get_attribute("href") for item in items]

    for itemlink, is_featured in zip(itemlinks, featured):
        driver.get(itemlink)
        name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"h1.name"))).text

        print(f'{name}\n{is_featured}')

答案 1 :(得分:0)

如下所示?我已将所需的信息解析为一个列表,然后可以循环并根据需要导航至。如果需要,请在页面上打印。

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import re

url = 'https://eatstreet.com/madison-wi/restaurants'
d  = webdriver.Chrome()
d.get(url)
featured = ['featured' if re.search('ng-if="::restaurant\.featured"',ad.get_attribute('innerHTML')) is not None else 'No' for ad in WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".rest-list-information")))]
titles = [[title.text, title.get_attribute('href')] for title in d.find_elements_by_css_selector(".rest-list-information a")]
results = list(zip(titles,featured))
for result in results:
#     if result[1] == 'featured':
#         print(result[0][1]) #navigate if required etc
    print(result[0][0], result[1])
    #d.get(result[0][1])  ##do what you want here

答案 2 :(得分:0)

您应该查找同时包含餐厅链接和相关“特色”按钮的div,而不只是包含餐厅链接:

<div class="rest-list-information">
  <a href="/madison-wi/restaurants/adamah-neighborhood-table-madison">Adamah Neighborhood Table</a>
  <div class="featured-border featured-border--green featured-border-left" style="">
    <span>Featured</span>
  </div>
</div>

通过这种方式,您可以获取两个相关项目:餐厅名称和“特色”按钮。

注意:未经测试。我不太记得Selenium / Python语法,但是它应该为您提供一个开始。

restaurants = [driver.find_elements(By.CLASS_NAME, "rest-list-information")]

for restaurant in restaurants
 restaurant_name = restaurant.get_attribute('href').text
 try:
  featured = name.find_element(By.CSS_SELECTOR, "div[class*='featured-border--green']").text
 except:
  featured = "No"