while True:
for rate in soup.find_all('div',{"class":"rating"}):
if rate.img is not None:
print (rate.img['alt'])
try:
driver.find_element_by_link_text('Next').click()
except:
break
driver.quit()
while True:
for rate in soup.findAll('div',{"class":"listing_title"}):
print (rate.a.text)
try:
driver.find_element_by_link_text('Next').click()
except:
break
driver.quit()
答案 0 :(得分:2)
这应该做你正在寻找的事情。你应该抓住两者的父类(我选择.listing
,从那里获取每个属性,将它们插入到dict中,然后用Python CSV库将dicts写入CSV。就像一个公平的警告,我没有运行它直到它坏了,我刚刚在第二个循环后断开以节省一些计算。
import csv
import time
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
url = 'http://www.tripadvisor.in/Hotels-g186338-London_England-Hotels.html'
driver = webdriver.Firefox()
driver.get(url)
hotels = []
while True:
html = driver.page_source
soup = BeautifulSoup(html)
listings = soup.select('div.listing')
for l in listings:
hotel = {}
hotel['name'] = l.select('a.property_title')[0].text
hotel['rating'] = float(l.select('img.sprite-ratings')[0]['alt'].split('of')[0])
hotels.append(hotel)
next = driver.find_element_by_link_text('Next')
if not next:
break
else:
next.click()
time.sleep(0.5)
if len(hotels) > 0:
with open('ratings.csv', 'w') as f:
fieldnames = [ k for k in hotels[0].keys() ]
writer = csv.DictWriter(f,fieldnames=fieldnames)
writer.writeheader()
for h in hotels:
writer.writerow(h)
driver.quit()
答案 1 :(得分:0)
您应该使用list。
我会尝试这样的事情:
for rate in soup.findAll('div',{"class":["rating","listing_title"]}):
(可能是错的,这台机器没有bs4供我检查,对不起)