将抓取的数据附加到不同的列

时间:2015-09-26 18:09:47

标签: python selenium web-scraping beautifulsoup

while True:
        for rate in soup.find_all('div',{"class":"rating"}):
         if rate.img is not None:   
               print (rate.img['alt'])
        try:
            driver.find_element_by_link_text('Next').click()
        except:
            break

driver.quit()


while True:
         for rate in soup.findAll('div',{"class":"listing_title"}):
            print (rate.a.text)
         try:
             driver.find_element_by_link_text('Next').click()
         except:
             break

driver.quit()

2 个答案:

答案 0 :(得分:2)

这应该做你正在寻找的事情。你应该抓住两者的父类(我选择.listing,从那里获取每个属性,将它们插入到dict中,然后用Python CSV库将dicts写入CSV。就像一个公平的警告,我没有运行它直到它坏了,我刚刚在第二个循环后断开以节省一些计算。

警告未在全场测试

import csv
import time

from bs4 import BeautifulSoup
import requests
from selenium import webdriver

url = 'http://www.tripadvisor.in/Hotels-g186338-London_England-Hotels.html'

driver = webdriver.Firefox()
driver.get(url)

hotels = []

while True:
    html = driver.page_source
    soup = BeautifulSoup(html)
    listings = soup.select('div.listing')

    for l in listings:
        hotel = {}
        hotel['name'] =  l.select('a.property_title')[0].text
        hotel['rating'] = float(l.select('img.sprite-ratings')[0]['alt'].split('of')[0])
        hotels.append(hotel)

    next = driver.find_element_by_link_text('Next')
    if not next:
        break
    else:
        next.click()
        time.sleep(0.5)

if len(hotels) > 0:
    with open('ratings.csv', 'w') as f:
        fieldnames = [ k for k in hotels[0].keys() ]
        writer = csv.DictWriter(f,fieldnames=fieldnames)
        writer.writeheader()
        for h in hotels:
            writer.writerow(h)

driver.quit()

答案 1 :(得分:0)

您应该使用list

我会尝试这样的事情:

for rate in soup.findAll('div',{"class":["rating","listing_title"]}):

(可能是错的,这台机器没有bs4供我检查,对不起)