我已经用python与硒结合编写了一个脚本,以从其着陆页中解析不同餐厅的链接,然后在导航至其目标页后抓取每个餐厅的name
和address
。很少有餐厅的链接上带有绿色的Featured
图标,如下图所示。
我要做的是从着陆页(whether a restaurant is featured)
抓取该信息,但将这些信息与 name
和 {address
,当我的浏览器位于目标页面上时。
如何在我当前的name
命令中同时打印address
,Featured
和餐厅是否print
?
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def fetch_info(driver,link):
driver.get(link)
itemlinks = [item.get_attribute("href") for item in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.restaurant-header")))]
for itemlink in itemlinks:
driver.get(itemlink)
name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"h1.name"))).text
address = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,".address-text-rest-menu span"))).text
print(f'{name}\n{address}')
if __name__ == '__main__':
url = "https://eatstreet.com/madison-wi/restaurants"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
try:
fetch_info(driver,url)
finally:
driver.quit()
预期结果(登录页面上显示Featured
)
Doughboy's Pizza - Cottage Grove
447 W. Cottage Grove Rd Cottage Grove WI, 53527
Not Featured
Silver Mine Subs - Beltline
2601 W Beltline Hwy Madison WI, 53713
Not Featured
Adamah Neighborhood Table
611 Langdon St Madison WI, 53703
Featured
一个这样的Featured
图标附加在着陆页的某些链接上。
答案 0 :(得分:1)
如果要打印名称和“精选”(如果找到),请尝试
def fetch_info(driver,link):
driver.get(link)
items = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"a.restaurant-header")))
featured = []
for item in items:
try:
item.find_element_by_xpath('./following-sibling::div//span[.="Featured"]')
featured.append('Featured')
except:
featured.append('Not featured')
itemlinks = [item.get_attribute("href") for item in items]
for itemlink, is_featured in zip(itemlinks, featured):
driver.get(itemlink)
name = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR,"h1.name"))).text
print(f'{name}\n{is_featured}')
答案 1 :(得分:0)
如下所示?我已将所需的信息解析为一个列表,然后可以循环并根据需要导航至。如果需要,请在页面上打印。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import re
url = 'https://eatstreet.com/madison-wi/restaurants'
d = webdriver.Chrome()
d.get(url)
featured = ['featured' if re.search('ng-if="::restaurant\.featured"',ad.get_attribute('innerHTML')) is not None else 'No' for ad in WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".rest-list-information")))]
titles = [[title.text, title.get_attribute('href')] for title in d.find_elements_by_css_selector(".rest-list-information a")]
results = list(zip(titles,featured))
for result in results:
# if result[1] == 'featured':
# print(result[0][1]) #navigate if required etc
print(result[0][0], result[1])
#d.get(result[0][1]) ##do what you want here
答案 2 :(得分:0)
您应该查找同时包含餐厅链接和相关“特色”按钮的div,而不只是包含餐厅链接:
<div class="rest-list-information">
<a href="/madison-wi/restaurants/adamah-neighborhood-table-madison">Adamah Neighborhood Table</a>
<div class="featured-border featured-border--green featured-border-left" style="">
<span>Featured</span>
</div>
</div>
通过这种方式,您可以获取两个相关项目:餐厅名称和“特色”按钮。
注意:未经测试。我不太记得Selenium / Python语法,但是它应该为您提供一个开始。
restaurants = [driver.find_elements(By.CLASS_NAME, "rest-list-information")]
for restaurant in restaurants
restaurant_name = restaurant.get_attribute('href').text
try:
featured = name.find_element(By.CSS_SELECTOR, "div[class*='featured-border--green']").text
except:
featured = "No"